 Thank you for the next time. Did you fix the cord? Did you? Now we won't have any purple screens. I know. We all miss a lovely purple screen of me messing with the cord to force it to work. Quickly, any questions? We'll do a few minutes. Questions? Project 3, is your not daily, but classly, reminder to get started on Project 3 if you haven't started, you're behind. I will question Project 3. Design of Project 3? In this project, no. So you don't have to worry. You shouldn't be worrying about malformed input in this one. It's hard enough to read in the proper input and do the proper calculations and output it in the exact right format. So that's what you should focus on, not worrying so much about the input. Obviously, in general, you want to do that, but you can make it as robust as possible. You should focus first on getting it correct and getting it for work. If I'm missing just like one or two test cases, is it possible for you to give me a hint about which one's in the same problem? No. Good question. Yeah. I can maybe, coming to office hours, I could try to get you to think about what possible things you could be missing. I'm happy to help in that way. But I'm not going to be like, oh yeah, here's the test case that you're missing or anything like that. It's not fair otherwise I have to give that to everyone. But yeah, we can talk about test cases at a high level when things pass. What you're doing, what you're not doing. Cool. All right. So let's get started and have to boot everything in my brain or what we're doing. OK. Of course, USB doesn't work on anything. So much better than a real pen. OK. So we've been looking at first and follow-sets. I think we'd all agree for a long time. So now comes the payoff as to why. Why not just to do project three, but why and how does that fit into our larger ideas of parsing. So let's take a simple grammar. S goes to A goes to A, A or epsilon. And B goes to B, B or A. And S goes to AB. So a pretty simple grammar. I think we can all agree. So let's kind of get first sets. So what's the first? You can kind of do it hopefully a bit quickly with that. So S is kind of tricky because it depends on A and B. Let's talk about what's the first of A, A or epsilon. And the first of B, little A or little B. Which rule? So then what's the first of S? A or B. A or B. So let's start at the very top. What kind of strings, what do these strings look like that this can generate? Let's think about this context for grammar. How would you describe these kind of English? What kind of strings you should have? Yeah. How would you describe like the, if we talk about the language that this grammar generates, it's going to generate a set of strings. So would it be like this set kind of contains AB, right? Yeah. Put it in English words. How would you kind of say that? So it's a grammar that generates strings, single letters, character. Character, yeah. Well, character, yeah. So what about what's proper about those strings, right? So they're going to be composed of A's and B's, right? Definitely. For a string of A's followed by a string of B's, it could even be just a string of B's with one A and two A's. Yeah, so definitely going to have to end in an A, right? We can see that from here, right? So B is going to either be in A or it's going to be any number of B's followed by an A. And then we can lead with any number of A's potentially zero. So now let's get into parsing. So when we talked about top-down parsing, right? We talked about, so what are we trying to make when we do parsing? Like what's our goal with parsing? What's our input? The grammar is one thing, yeah? What was that? Tokens. Yeah, tokens are the string that we're trying to parse, right? And what's our output? I think somebody just said it already. The parse tree, right? Yeah, we want to create the parse tree from given this input string. So the idea is, OK, given this grammar, right? This is the grammar we'll call it G or something, right? Given this grammar, we have some input string. Let's call it B, A, A. Right? So can you kind of visually check that, yes, this is a string in this language? Yeah. Yeah. So by applying these rules, you could derive starting from S, you could derive this string, A, B, A. OK. So what we want to do is we want to create this parse tree. So what does this parse tree look like? Let's kind of look at where our goal state is, and we'll try to see where we are now and how we can make an algorithm to get to this goal state. So if S is the root of this parse tree, what do you mean the children of S? A, B, right? Yeah, we know that right off the back, no. There's no condition here, right? Exactly. So we know to parse S, we're always going to parse A and parse B, right? So S is always going to be composed of A or B. There's no choice here. There's no or here. So we don't have to choose between anything. We know that if we want to parse an A, we first have to parse a B. If we want to parse an S, we first have to parse an A and parse a B. So we're going to do this in essentially pseudocode to build this parse tree. So we're going to find a function called parse capital S. So we're going to basically, we're going to use the form we have parse. I'm going to call it beta, where beta is a non-terminal. So when we say, hey, we want to generate a parser, well, we really want a recursive descent parser, we want to define all of these functions for all the non-terminals in our grammar. And this will tell us how to actually parse in that grammar. So if I'm trying to parse S, and I have this rule S goes to AB, I don't have a choice of the rule. So I don't even need to read any bit of the input, because I know exactly which rule to choose. So the first thing I'm going to do is call parse A, which you can think of, we're going to find later. But abstractly, you can think of, OK, this is a function that's going to handle building this whole tree, this whole section of the tree, the parse A. And then after we're done parsing A, then what do we want to parse? B. We should parse in A, parse in B. There may be extra stuff we want to do here in a sec, but we'll just stick to a high level at this point. So here, from S's perspective, as long as A and B are written properly, it just knows it's creating essentially this part of the tree. So everybody see how this tree structures. So this is not explicitly creating this tree, but it's calling parse A and parse B in such a way that these calls are in this order of this tree. We're first going to be doing S, and then we're going to call parse A, and that's going to do something, which we'll look at in a bit. And then we're going to call B, and that's going to do something in a bit, and then we're going to return. So the question is, how do we write? And to do this, we didn't look at it. The only thing we need to look at here is just this rule. We didn't look at the first set or the follow sets or anything. So actually, we're going to sneak ahead a little bit. Let's do it kind of in a little reverse order. So I want to parse B. How do I choose which of these rules? Yeah. You do get token pairs to be able to go to B, like it would be. Yeah. So I get token, so let me check to make sure I'm using the right variable naming because it's going to be consistent. T type, yeah, just for that reason right then. OK, we'll call it T type. So the type of the token, so I'm going to call getToken. What is getToken? Yeah. So this is part of the lexer. So here we're using, from the parse, we're using the lexer API to get a token. So this is my input string, and I'm starting. I haven't called getToken at all. The first thing that's going to be returned. All right, because we're really reading A, B, and A. Yes. Yeah, there should be a space here. Yeah. So the first token here is A. The next token is B, and the next token is A. And then we're going to move this input and basically consume those tokens. But let's look at this. So I'm thinking about B abstractly. So we've looked at the first token. And remember, we want to be a predictive parser. What does that mean? We want to know which rule to choose based on just looking one token ahead. So before we achieve the two, now we are going to restrict ourselves to say only one token ahead. So based on whatever this token is, how do we choose which of these rules to parse as B? Create a rule, or a type, and you can do a little a type or b type. And you can compare the t type with that type that you're trying to solve. You would say if t type equals, and you can choose a or b type, or however you want to do it. But in this case, it will be a type rank. So we'll just try and write this. We'll go over how it actually works at a high level. So how do I decide between these two rules? So if it's a little b, or if it's an a, right? So just looking at one token ahead, so how many tell that? Well, you can look at the first set, and they'll tell you what you decide. If I take each of these rules, and I say, what's the first of little b, big b? Can I do this? Can I do first sets on sequences? Just the same as anything else. What's the first set of little b, big b? Little b. Little b. What's the first set of a? Little a. Little a? Yeah. So this tells you deciding between these two rules, I can look one character ahead, or one token ahead, and I know exactly which one of these rules had to have come from this token. Which of these rules had to have generated that token? Because I know a b can only come from here, and an a can only come from here. So then I can check. I can say, OK, well, if t-type is a b, then I know I'm in this rule. So now we have to think about our input. So I'm going to get rid of that. Right now we're just here. We don't know what happened before us. All we know is we just read a token, and there's a b there. So where did this b come from based on what we know about this language right here, about this rule? Did you get that b by calling the token, and that gave you the first b? I got it. But where did it come from from the tree? The tree created that string. It was a first and b, pretty much. Yeah, from this little b right here, from this rule. So if I'm looking at, I have some string, right, and my next token is a b, and I don't actually even know what comes after me. And I say, OK, I'm trying to make that a b. I'm trying to make that node in my parse tree. So I know I have to choose this rule because of the first sets. So I know the first set difference between these rules. I know by looking one character ahead that this b had to have come from this rule. So then when I draw my parse tree, so what are the children here of this big b going to be? Little b and big b. Little b and big b, yeah. So then what does that mean about that little b I just read from the input? It came from here, right? This little b actually came from this little b right here. So we know that the next time when we call this next parse b, we want to make sure that the input is now reading from here, right? That we've essentially parsed and consumed that b and said, OK, I've matched that little b up to this b in this tree. Therefore, I don't want to call on get token. It seems weird because we're going to do nothing based on everything I just said. But the important part is that we're, right? So I don't want to put it here just to be explicit. I want to make sure that we advance the input because we know this b came from this little b came from parsing this big b right here. So if we do that, then what's the next thing we want to parse? Parsh b again. So thinking about this recursively, right? We've called get token. Then we're going to call parse b again. So we know that input is going to be moved, right? Get token's going to consume tokens from the input. So every time we call this, when this rule gets chosen, we're going to consume a b from the input string. So that's going to move the parsing along that input string. And every one of these parse bs is going to decide, hey, am I a b b b or am I a b little a? But we have to actually write this other part. So what's the other option if t-type is a b? So what happens if it's an a? I do nothing. And when I say do nothing, I mean, I do whatever I'm doing in the parsing. I could explicitly create this parse tree here, right? And know that I have a b parent and I know that the child of that b node is going to be an a, so I can create this structure. But I know that the rule, basically, b goes to a was chosen here. And here I know the rule b goes to little b, big b was chosen. What if it's not one of these? Yeah, we have a problem, right? Because we're trying to parse a b. If we get anything else like a c, exactly, yeah. We know from the first set of b it better be either an a or a b. Or we have a problem. We have a syntax error, right? Because we tried to parse a b. So yeah, so we can just, we'll assume that we have some function called syntax error that we can call and it will just out with the error stop parsing, right? So does the same pretty straightforward kind of get it? Yeah? Yeah, good question. For the first statement, if you go to little b, big b and you substitute the big b with another b, it keeps on going and that's what that first statement is doing, right? Exactly. You keep on calling parsing b and if you keep on going, you just do it. Exactly, and this mirrors exactly this rule, right? This rule can go as many times as it wants to generate any number of b's, right? And then at some point it will stop with an a. OK. It's going to have to stop, yeah. If it's correct. It's correct, yeah. Yeah, if it was the string of b, b, b, b, c, right? We would parse all these, create this tree, right? And we could go to the last one. Exactly, the last one would say, actually, I just got a c, so that means this is not a string in this grammar. Or this is, yeah, these are not, these tokens did not come from this grammar. Let's do a. So I'm going to change it a little bit first. Let's do this. So if it's a c, right, then we know the first of a is a and c, right? So then can I, what's the first of little a, big a? And what's the first of c? C. So can I distinguish between these two rules? Yeah, right? OK, yeah, so let's actually, that's a good one. Let's go like this. Let's say it's like a, b, like this, right? So now the first of a is just little a, right? But now we have little a, big a, little a, b, right? If I just read one token ahead, can I ever tell which of these rules to choose? No. No. So that actually leads to our first property of what it means to be a predictive recursive percent parser. You would look at the next one. You would actually just quit. Because to do that, you need something more powerful than first sets, right? Because first sets say what's the first character. You'd need basically like first and second sets, which would show you what the first two characters could possibly be, which are much more complicated. So that's why we're just going to keep it just looking at that one character ahead. So yeah, let's go down here. So we kind of have our first rule for predictive parser, so if we have a rule of the form a goes to, we'll say alpha or beta, so these are just sequences, any number of rules. So we know that to be a predictive parser, it has to be the case that how are the first of alpha and the first of beta? How must they be related? Exactly. No overlap. Just as they have nothing in common, they have no overlap here. This is not an end, it's the intersection, but it's a lot easier to draw an end than an upside down view. So then the question becomes, is this enough? Is this the only rule we need? Can we always tell exactly which rule to follow? So that's where we're going to go back and change our example. OK, let's change this guy back to an epsilon. OK, so now we want to call parser, right? So what's the first thing we're going to do? Get token. Yeah, we want to get token, right? So we're going to get the token. And then what are we going to check? How do we distinguish between these two rules? So let's first think about the first sets, right? So based on that rule we just created, right? So the first of this and the first of this. So what's the first of little a big a? What's the first of epsilon? So is there any overlap between these sets? No, completely disjoint sets, right? So it passes our first rule. So now how do we determine? Do we write? So do we say, OK, if it's a little a, then you go, so let's say like this, right? So if t type, right, let's just kind of follow mechanically what we did here, right? If t type is equal to a, then what am I going to do? Am I going to call on get token? No, right? So this is the rule a goes to little a big a. And then I'm going to call parse a. Did I do anything else? No, I'm actually going to do it as much. So then what do I do for this else if can I do this t? I do this. What's t type going to return? Let's get token going to return. Oh, it's going to return. It's going to return whatever is opposite. Yeah, so it's not, is it ever going to return an epsilon? No. No, right? The token, the lexer, doesn't know epsilon means nothing, right? Exactly. So when you call get token, it's either going to be one of the tokens you defined or an error or end a file. So this is pretty much nonsensical, right? This is never going to happen, because let's think about the case when we have a b, right? The string a b. So the first time through the lexer is here. We call get token. It returns the token a, right? So we go, great, OK, call parse a, right? And then on our second time through, right, the token is here, we call get token. We're going to get a little b, but that's not epsilon. So we can't check if the token is epsilon, right? If we call get token again, we'll get end a file, right? There's no more tokens to read. So what can we do? What do we have to check? When do we know if, yeah. So how do we know if this rule was taken, right? So what is the follow set of? So let's, we didn't write it here. Running out of room, or should I just let it move? You don't want to move it, right? I don't know. I think I'd have to specify the paper type or something. OK, let's think about follow. We'll just add it in here, because we can. So what follows s? End of file. End of file, right? You can see that pretty well. Follow a. So first let's think about what follows b. End of file. End of file, right? We can see that it's here. If we follow this b here, there's nothing after it. Let's see from here, it's the follow of s. It's the same as the follow of b. So what's the follow of a? First of b. Right, first of b from here, which is a and b. So what does this mean? What does this follow of a is a and b mean? Semantic, like, that's right, we bless you. So what do we got after a little a? So a little a is always followed by either an a or a b. Not a little a at the beginning, but big a, right? Yeah, big. Exactly. Yeah, so every possible instance of, right, so we think about all possible parse trees over here, that start with a as their root, right? It's going to generate something, which is going to generate some part of the string, right? And this is going to be something else up here. We don't know what. And so we know something else is going to generate this whole other part of the string, right? So the follow set says, hey, what is the first, what is the token, or in this case, letter, right? What is the token that can come directly after any capital A, right? So then what if A went to epsilon here? It has to, that other, so A for it to be syntactically correct, it's got to be in our follow set, right? Because if we got a C, let me know, hey, this is wrong because it's not possible for a C to come after A. So then how can I determine which one of these two rules is taken? Yeah, if it's in the first of this one, right, which is a big a, or if it's in the follow set of a, in the case of epsilon. So can I distinguish here, right? So the follow of A is A and B, right? So I'm gonna get rid of this because this is something that you can't possibly do, right? So I need to check if the t-type is an A, or t-type is a B, then I know I have the rule A goes to epsilon, but what's the problem here? You have one of the t-type equals A. Yeah, I have overlapped in constraints, right? So just looking one character ahead, I actually don't know which of these two rules to choose. So then in what case could I choose? If it's B, then I know, but how would the follow of A have to change? What's the problem here? It has to not be empty. What was it? The intersection of follow A in first of A has to be empty. Yeah, the intersection of first of A and follow of A have to be empty, exactly, right? Because, but doesn't that matter here for this case? For B, big B? No, it's not, it's not a case, or it doesn't affect it, because there's no epsilon, right? We don't really care what follows it because we can always determine, as long as the first sets of each of the rules are disjoint, then we're good. But if there's an epsilon here, then if there's an intersection between the first of A and follow of A, then we have a problem that leads us to our second rule. If there exists a rule, like let's say A goes to epsilon, then first of A intersection with, let's be equal to the empty set. So these are the only rules. Pretty cool, it's kind of easy. So it could break either of those rules, though. You can't do a predictive parser. You can still do a, you can still make a recursive descent parser, which is just in the same format we're doing, but you can't decide based on one token, you have to try not, I mean you have to try all possible combinations essentially. So at some point you've either tried them all and you've got it or you can't possibly do it. Yeah, if you find one, you've got to correct parse tree if you can't. So note that this is more restrictive than just ambiguous, right? Because ambiguous means that you can't have two parse trees for the same grammar. But this is more specific. This means I need to be able to tell exactly which rule to choose based on looking one character ahead. Essentially what this means. Questions on this? You can look at it a bit more formally. Right, so we want, so remember our high level goal is efficiency. We want to make our parser efficient and we want to do no backtracking. So we say, hey, if we have the rules A goes to alpha and A goes to beta, then the first of A intersect the first of beta must be zero or it must be the empty set, right? This is what we derived just by looking at it, right? And then we derived the second rule, which is, I guess, I guess this is actually better than the rule I put. Because it could be that there's not an explicit rule of A goes to epsilon, but it could be A goes to B and B goes to epsilon, right? So we want to check if epsilon is in the first of A, then the first of A and union intersect with the follow of A must be the empty set. So once we have this, once we have these conditions, we can write a parser and we can write a parser very simply and very efficiently. So we've already seen that, so we will very quickly go through this example. So if you're still confused about first and follow sets, we're gonna briefly go over, but from, actually, I guess I didn't grade that part of the midterm. I haven't graded it yet, I should say. Okay, so we want to, so if you want to make a predictive recursive descent parser for a language, what do you have to do for that? What do you need to calculate first and follow sets? The grammar. Yeah, so you need a context-free grammar, right? You need a grammar, you need a lexer, right, to decide the tokens. So you first create the context-free grammar, then you calculate first and follow sets, and then what do you do? What's your next step? You need to turn all the first and follow sets past those two rules. Yes, right, so if you're trying to make a predictive parser and you can't do it because it doesn't make sense to try all those rules out, so you can prove this to yourself. You can easily do this, so how do you show this? Yeah, you don't want to laugh. You, so just to be very clear, since this came up last semester at some point, can't remember which one was on, right? Just writing this is not enough to prove that this applies to a grammar. Like, oh yeah, it holds because of this thing, right? Like you have to actually show, okay, there are these rules, right, where A goes to B, B, and A goes to C, and you can show that the first of little a, big a, intersect with the first of C is the empty set, right? So, simply stating these rules and saying, yeah, it applies to the grammar is not proving it, right? You want to show the sets of each of those to clearly see that there's no overlap. Okay, then we want to write the predictive recursive descent parser, right? Just as we were doing by hand, right? We want to do that, and we'll kind of get into the algorithm towards the end. This is gonna drive home more intuition about how we do this and how we use first and follow to write this, the parser. So the idea is, this is actually kind of cool. So, can a computer calculate first and follow sets? Very hope, yes, otherwise your homework's gonna be very, very hard, right? Yes, a program could do that. Could a program prove that the context for grammar allows a predictive recursive descent parser? Yeah, right, it's first and follow sets. It just applies all those tests, it can test. Then we'll see that actually, you can use a very simple algorithm to write the predictive recursive descent parser using the first and follow sets. Can a computer create the context for your grammar? Tricky question. Maybe, yeah, it kind of depends, right? I mean, that's kind of actually the, if you think about it, that's kind of one of the goals of machine learning, not necessarily context for your grammar, but trying to create a grammar and a parser for English, spoken English, or written English so that a computer could understand what's out there on the web, right? But in essence, right, this is where the interesting part comes in is, you as a human create this grammar to represent the C programming language, right? Or Java, or whatever programming language is you wanted to find. You create this context for your grammar, and then you can actually have a program crank out all this thing and generate a parser for you, which is very cool. So we're gonna look at the domain of email addresses. So how many people in here have tried to parse email addresses probably using a regular expression? Some people? Yeah, you're doing it wrong. It turns out that emails are much more complicated than they seem, so what does an email look like, like, generally, right? So if you're gonna parse an email, or do a regular expression to define an email. Blank at the link. Huh, was it? Blank at the link. Use your name as a dot. Yeah, something at something, right? And there could be, yeah, so the idea is how do we parse it or validate it, right? So we think about, well, hey, at the high level it's really simple. You just have a name, and then you have an at sign, and you have some domain, and dot, a top level domain. And you see on the exam, you can actually write a regular expression to do this domain part pretty easily. But it turns out it's not simple. So what is the name, so kind of go back to like, I don't know, if you can try to, have you know where your email kind of originated from? Like what was the goal? Yeah, who? Scientists. Scientists, definitely true, yes. Specifically, actually that's a good point, yeah, academics and scientists on a specific machine, right? I'm a user on a specific machine. I wanna send an email to another user on a different machine, right? So this is where it all comes from. And actually if you look at the really old spec, they didn't use domains, they used an exclamation point to specify a route. So you would say that, okay, I wanna send an email to, I don't know, foo at bar. But to get to bar, you actually hop through, I don't know, mailer, and then pound, and then like, depending on where they are, you could be like Stanford, pound, bar. Like you have to specify the exact route to get the mail from you to them. Yeah, it's crazy. So because of that, there's, and all this historical baggage, right? So, maybe you know when the web was created? 70s, 80s. What's the difference between the web and the internet? The web is just the network. The internet is everything you need. Close. The internet is everything that talks to each other in an interaction of networks. What's the web? It's just a service that runs on port 80 and 553 on the network. So the web, so the web is basically HTTP, HTML, JavaScript, all that stuff, right? Which is one of the applications that runs on top of the internet, right? There's a lot of other protocols that are operating on the internet, like DNS, right? DNS is how that domain name gets resolved to an IP address. That doesn't use port 80 or HTTP or anything. It uses UDP on a completely different port. So this is why, so email addresses, were one of the first kinds of things. It was like, how do I send a message from one machine to another? Which came about in the 70s. Whereas newer things like the web came out in the 90s, like 92, I have to look at the exact date. 92 or 93 is about when Tim Berners-Lee created the internet or the worldwide web at CERN. So because of this, because of all this historical baggage for the last 40 years, turns out it's not very simple. So you can actually send an email. You can use double quotes to specify the name part to allow spaces in the name. So this would mean I want to send a message to example, like the example.com is where this person lives and the name I'm trying to talk to is CSE space 340. Right, so the double quotes are actually not part of the email address. Like it's not the name that you're trying to send. So if you're parsing this email address and you want to say it's valid, yeah, it's valid. You can also have slashes in there. You can have an equal sign in here to specify different folders. You can actually, here's where your regular expressions get into trouble. You can have the at symbol inside the name as long as it's within double quotes. You can also have a slash in there. You can slash escape that at sign and you can also actually have double quotes within the double quotes as long as you escape them with a slash. And then it gets even more confusing. So when you send me an email, how do I know what your name is? And you're not cool user 89 at ASU.edu. Yeah, so actually, so just parsing email addresses, there are certain cases that does get very complicated. I don't think I got into comments. You can put comments in there in some places. But even with all this, right, yeah, there's extra metadata where I wanna say, hey, I wanna send an email to this person and this is their name, right? So when you get an email from somebody, you often don't even see their email address, right? You see their name and then the email address may be below it, right? So this is kind of how this is done. So this, so this is, so the email address is now in brackets and then the first name and the name is space separated parts before the brackets. And inside those, you can use double quotes to specify a whole word. And so you can have an at sign within there, but that's this person's name. So the rabbit hole gets very complicated the more you dig, right, it's not quite as simple as we all want it to be. So there's a company called Mailgun, which provides email services as an API. So they released a really cool open source tool to actually validate email addresses. And so they actually implemented their parser as a recursive descent parser, which is super cool. It turns out, and you can go look at the code, it's online, it's in a Python file. It looks exactly like the thing we're looking at, like parse this, parse this, check this input. They're not doing a predictive one, so it's trying all possible inputs, which could be an interesting way to try to attack there and do a denial of service against their validator, but I haven't tried that yet. So I extracted their context free grammar from their parser, and it's incredibly complicated. So it has, and I'm gonna go over this very quickly, but this is a great example if you wanna practice first and follow the sets. So they have four different tokens in their language. They have quoted string, which is string double quotes. They have an atom, which is something that is, has no spaces in it. And they have a dot atom, which is something with no spaces in it with dots. And they have white space. And then they say an address is either a name address from the RFC or a name address that's lax. So the other problem with emails is there's the spec, which says exactly what an email address is. And then there are email servers that may accept things that are not in that spec. So over time, those non-specification features become standard, because in essence, if you can't send an email to somebody, it doesn't matter that your email address doesn't match the spec as long as, if 90% of the servers out there accept it, then that other 10% are going to implement basically that bug, this lax parsing standard. Or it's in an address specification. And so we can see that the name address RFC is the display name. So that's when you can put your actual name and then the angle address RFC is in the angle brackets. Or it can be just an angle address. And a display name is a word followed by a display name, RFC list, which is white space word. And I'm not gonna go through this, but we can see an angle address is a left bracket, an angle spec, a right bracket, white space. So this is basically just putting white space everywhere. So it's showing that you can, because there are certain places where you can have white space in certain places where you cannot have white space, is deciding what's white space simple. Not always, what if it's a Unicode? What if it's, how is the string encoded, right? So we have to deal with all these possibilities. Wouldn't that be done before? Depends. I'd like to think of it before. You would like to think that, I would like to think that too. So it goes on. So you've got the address specification, right? It's a local part at domain, or at and then a domain. And we can see a domain is a dot atom, a local part of the dot atom, or quoted string. And so I've simplified this. So we kind of have this grammar here, which is quote, simpler. So it's either a name or an address. I've got rid of all the white space. Domain, atom, at, and a quoted string, at. I had to introduce, maybe you can see why I introduced these additional characters here if you step through that. Those are the domain followed by the at sign. And this is a quoted string followed by an at sign. So if you do that, you can actually make this a predictive context-free grammar. So I can step through, right? I can do my handy calculating first sets, right? I'm gonna step through this, because I think we've done enough. I kind of did this last semester so that they can get more practice, but I feel like this class has got it down. So we get to the first sets here of all the different non-terminals in our grammar. Then we do the same thing with the follow sets. We can calculate the follow sets. So we can see the follow sets for all of the non-terminals in our grammar. Now we have that, what do we have to do? See if it was valid, make sure it doesn't break those points. Yes, make sure that it's valid, right? Prove that it, okay. Prove that it, it, prove that it, prove that it supports a predictive recursive descent parser. Okay, so here we have our grammar, we have our first sets, we have our follow sets. So what am I gonna check here about address? Yeah, so the first, I'm gonna check the first of name address and the first address specification, right? Which actually is different. I'm going to check, oh, then the next one I'm gonna check, display name, angle address, angle address. Then what about display name list, what am I gonna check here? I'll just go through the first rule first. We can apply the first rule and then apply the second rule, right? If you intersect the, yeah. Exactly, word, display name list, right? The first of word, display name list, intersect with the first of epsilon. The first of domain, out of at domain, union with the first quoted string at. We can just keep going down all the things here. Yeah, then we go through and we say, okay, which ones have an epsilon in their first sets? Here's just display name list. So we're gonna say, does the first of display name list intersect with the follow display name list? No, right? So I know that this supports a predictive parser. Yeah. So is, when you're trying to create a predictive, predictive recursive parser, predictive recursive descent parser. So when you're trying to. Predictive, so predictive, right? Recursive, as we saw, right? We're gonna call each other. It's descending from the top now. Right. And we're parsing. So when you're trying to create that and trying to, are you trying to set up more your context for your environment in a way such that that is possible? Yes. Yeah, so that's actually the language designer has to deal with this interplay of do I, like in this case, I changed the tokens, which allowed me to support a predictive parser, which is actually something they could do and it would be a lot faster. I mean, it depends a lot, but it would be faster than what they're currently doing, which is just trying all possible combinations. There's not a ton of play here in what things they can and cannot do. But yeah, yeah, in general, that's where a lot of the creativity comes in, right? Is understanding this process, understanding what it means, and then designing the language in such a way. So this is actually why in a lot of languages, like can we do Ruby? You do like, to do a if statement in Ruby, you do like if condition and then you have to end with an end if, right? Or like in C, right? So your Java, you have to do braces, right? You have to enclose blocks in braces, like if blocks. But it does get to some, you may have to add some additional rules outside the grammar itself to handle some ambiguous cases. And that's where it gets tricky. So it depends on what the language wants to do, but a lot of like, a lot of actually the parsers are written by hand basically in these techniques. So let's look at how we write this. So we wanna write parse address. So what first and follow sets do I wanna look at here? Like what are the ones that are important? Do I have to look at all 10 of them? No, just the ones that correspond to those. Yeah, the ones for name address and address specification, right? And so I have a dress and I have the name address is an angle bracket and an atom or a quoted string. And the address specification is domain, atom, app or quoted string, app. The follow sets that are important are the follow sets for these symbols here, right? So if I want to parse address was the first thing I wanna do. Get token, right? Then how do I know if it's which rule to take here? You check these first sets. Yeah, so I check, I say, okay, if it's this, then it's gonna be either a left bracket and an atom or a quoted string, right? So I say, hey, is it a left bracket and an atom or a quoted string? So what if it's not, then what do I check next? Yeah, which is domain, atom, app and quoted string, app. So now what, so we'll just briefly go through this. So what, that's the important thing to think about, right? So I've called get token, right? Which means I've moved the parser forward along, or I've, sorry, not the parser. I've moved the lexer forward along the input, right? But does address itself actually create is one of the child's of address a left bracket and an atom or a quoted string? Exactly, no, right? So if it results with children. Yeah, those are, somebody else is responsible for actually parsing, creating that link in our tree. But we know it has to be something in name address, but we know that address is not responsible. We don't parse a left bracket or an atom because we can see from the rule we don't generate a left bracket or an atom. So then what do we need to do first before we call parse name address? Unget token. Yeah, we want to call unget token, right? And move the input back. So that way, when we call parse name address, it's starting from the proper place in the input, right? Because when it goes back to, if we think about it as trees, right? Yeah. So the cases in which you call unget token is when you've decided or when you found if this token is in that first set, it's only going to a non-terminal, it's not a terminal. Yes, you call unget token when you have no tokens here to parse, right? You need to consume or move the input forward or create that link in the tree, whatever helps you think about it, when your rule generates that token, that terminal. Yeah. So what did you do parse name address and you didn't do parse address? Because you're trying to go back to t-type get token name. Say that again? So why did you do parse name address and you didn't do parse address? Because you're trying to go back up, right? So you're trying to go back to t-type equals get token name. Ah, no, so I'm just following this rule, right? So I know an address is composed of a name address. Right. So I know because of looking at that first token, I know that it has to be a name address. Okay. So I'm gonna let name address parse something, right? So if we think about this as a tree, if we think about this as a tree, right? Here at the top we have an address. So what's gonna be the childhood address? The name address and address, right? Yeah, so right? I'm gonna do an NA for name address, right? Because I know it's this rule. It can't possibly be anything else. So you have to parse that. Exactly, so name address is responsible, right? So basically every layer is responsible for creating its children, right? And then each of their children are responsible for creating their children, right? This is why we have this recursive process. So this is what's important is that our input string down here, right? Our input series of tokens, A never generated a token. So if we try to parse name address, but we've moved the input slightly, right? Then name address is parsing the wrong thing because it generated all of those tokens. The only possible way. It's not considering the tokens they already parsed. Exactly, yes. Okay, yes. That first get token is just for decision. This is the predictive part, exactly. The predictive part is we call get token, we look one ahead and then we decide. Cool. All right, great. We'll stop here, we'll go through this again. On Friday, try to remember what day it was.