 Great. All right. Let's get started today. I want to start off asking for any questions on Project 3. We'll go over the description now for about, I don't know, the first five or ten minutes of class. I'm going to go back and continue and finish up the syntax analysis. Any questions before we start? Okay. Okay. Project 3. So, Project 3, I sent it out to the man list of released last Wednesday. It's due on the 7th, so you have, I don't know, I don't think Wilson is here, but you have him designed for that because he says this assignment is very difficult, so he got you more time, so thank you. But that should tell you, hopefully it doesn't lure you into a false sense of security where you think, I've got plenty of time, I don't need to worry about it now, because that time is definitely necessary in order to finish the assignment. So the goal is to write, you're going to, at a high level, you're going to write a program in C or C++, I don't care, that reads in a description of a context-free grammar. It's going to calculate first and follow sets of that grammar. And so as you'll see, there's a specification for the input of the context-free grammar in the form of a grammar. So you're going to have to parse that information that comes in, represent it as a context-free grammar, do calculations of first sets, follow sets, and output those in exactly the format that we want here. So to make it a little bit easier for everyone, we will use a command line parameter to specify if you want to calculate first sets or follow sets. So that's what this piece of C code here shows you how to do. So in case you've, I don't know, never seen this before or something. So yeah, basically what this code is doing is it's checking the number of arguments, and if it's less than two, it's going to say, hey, there's a missing argument. So argv0 is always the command used to invoke your program. And so argv1 is the parameter that's passing to your function. So we put that to the integer, and then we check if it's one or two. If it's one, you're going to calculate first sets. If it's two, you're going to calculate follow sets. Since anything else, you can throw an error. If there's nothing to throw an error, it doesn't matter. Questions about this? Is this in our code too? Is this just our project? Yeah, you can use this in your code. Absolutely. Yeah, or you can do it your own way, whatever, as long as it's functionally equal. Okay, so the grammar description. Here we have a description of the grammar in context-free grammar. So we're going to have a high level. There's various sections to the input grammar, each of them separated by the hash symbol. And then the entire grammar input is going to be ended with double patches. Anything after double patches is ignored. It doesn't matter. All the things a little bit easier, all the grammar symbols, the tokens as you will, are going to be whitespace separated. So by one or more whitespace characters. And so here's basically the list. So the starting non-terminal. So you're going to have a non-terminal list followed by a rule list. Where a non-terminal list is an ID list followed by an hash. An ID list is IDs, or more ID lists. The rule list, so this is the listing of the rules. So left-hand side and right-hand side. So a rule is some identifier, which we've identified below. An arrow, which is the character, these two characters here. A right-hand side followed by a hash. Where a right-hand side is a list of identifiers. So this is, so using everything we've done up to now, right? So you know how to interpret these tokens, and you know how to interpret this context for grammar. So it should make sense of how the input is going to look like. Okay, we've actually specified all the letters and digits here, exactly what they are. So IDs are exactly what we've talked about before. They're a letter followed by zero or more letters or digits. And so this is just like the input specification of what the grammar is going to look like. So the semantics, so here's where we get, what is the meaning behind this input that you're going to get? So actually, it helps to actually look at an example input grammar. So the first section here, right, looking at the grammar above here. So S is going to be separated into a non-terminal list followed by a rule list. So that would be this first list is the list of non-terminals. So that's exactly what the semantics specified is. The first input section is the list of non-terminals in the grammar. So note here that there's no conventions about if it's uppercase, it's a non-terminal, or lowercase it's not an internal, it doesn't matter. If it's an ID in the non-terminal list, then it's a non-terminal. So this would be the non-terminal list. And following that up to the double equal sign is all of the rules. And each rule is an ID, an arrow, and some list to the right-hand side, followed by an ending hash. And each of these define one of the production rules in our grammar. So these define that a declaration produces ID list, colon, ID. And ID list produces ID, ID list one. ID list one produces here this empty, yeah, for instance, epsilon, so it produces nothing. And ID list one also produces comma ID, ID list one. And we know we're done because we've seen the double equal sign here. Questions? So one thing to be well, how do we know? So we specify the non-terminals, right? The non-terminals is the first input section. How do we specify the terminals? Everything else, yeah, exactly. So if you see anything on the right-hand side that is not a non-terminal, it's not a non-terminal to the terminal. And so here we've kind of broken down and represented it as the non-terminals are here, of this input, the actual representation, the grammar representation of this input grammar, and all the terminals. And I know that the white space doesn't matter at all here, so I don't think that you can rely on each line being a rule or something like that. So this is just play around with the white space. It's exactly the same test case, exactly the same example. Questions? Okay, anyone want to take? Yeah, is that a question? No. So implementation, same way we've been doing projects up till now. So follow the direction, try to run on CentOS. I think by now we're all familiar with that. Okay, requirements. So you're going to calculate either the first sets or the follow sets of the grammar, depending on the command line parameter that's passed. So we've gone over the first sets, we've gone over the follow sets, we're going to go over an example again today, so hopefully it's not a new concept. So yeah, you just got to follow the instructions. So when you're outputting the first sets, you're going to output them in this order. So here's the example first set output from the previous example. We say the first set of a declaration is an ID, so if ID 1 is epsilon and comma, the first set of ID list is ID. Questions on first sets? Follow sets, follow sets are very similar in calculating the follow sets. The one thing is if end of file is in the follow set, we're represented with a dollar sign. So for that example grammar, when it's run with that command line parameter, you're going to tune to output the follow sets. It's going to output the follow sets of declaration is the dollar sign, the follow sets of ID list 1 is the colon, follow set of ID list is the colon. Okay, and this is the breakdown, so in case you want to know what. So, in case you want to know what. Yes, so you should be able to go very easily from permission to test cases to your grade. And so for this assignment, there's no restrictions. While we're using C or C plus plus, we don't really care. We've been going over this batch, we're going to pass the match exactly. It doesn't match, it doesn't count. We're going to apply it to Wednesday at the latest. Is the project going to be on Blackboard or? They'll be posted on Blackboard. I think most of them are posting them either today or tomorrow. Yeah, so all the grade stuff will be on Blackboard. But I think that should be easy for everyone. Any other questions? That's the other stuff. What we left off with on Wednesday was we were looking at the, let's see if I can get all of them here, a crazed flow. A very long and complicated context-free grammar that specifies an email address. So something that should be very simple. This is taken from an actual real open-source project that people use in production. That's by a real company. So I simplified it so we could use it as an example for this class. So there's, I think, five non-terminals here. So one is a loaded string. So a string is limited by double quotes. One is an atom, which the specifics here we don't really need to get into. You can go to the code if you really care about it. A dot atom, so an atom with a dot in it, which is something you might want to use for, let's say, to specify a domain name. So a domain name, usually something got something else. Quoted string act. So this is a token I have to kind of inject in here to get the example to work. This is a quoted string followed by the act symbol. So it would be, you can consider it kind of as the first part of an email address, maybe. A dot atom act. So a dot atom act would be a dot atom followed by the act sign. So you can do this, you can make your tokens right. You can just add a new symbol onto the end of the token. And now that token is going to match with maybe the longest prefix matching. So that's why I did that. Questions on those? Non-terminals. I mean, for our purposes specifically right now we're going to calculate first sets, all sets, and write a recursive, a predictive recursive descent parser. So, as far as terms of what the token is actually R doesn't really matter, but these are real tokens that you can translate into a regular thread. Okay, so at the very high level, our first, our starting production rule here, we have an address is either a name address or an address specification. And so, then we're just going to break these down and define all these rules. So I kind of got rid of, I want to say about a third. Is this giving some feedback? Yeah, a little bit. How about now? Better? Maybe we'll see. Okay. So our next rule is going to be a name address goes to either a a display name followed by an angle address or just an angle address. So this would be a name address here is either going to be so display name, as we saw in that example, is some string beforehand where the email address is in brackets. So if you go into Gmail and you look at people's emails that's actually how it represents it. Is their name usually double quotes followed by angle brackets with the actual email address. Okay, and we'll see. So a display name is some word followed by a display name list and a display name list is a word followed by a display name list or an empty string. So display name is going to be one or more words. We'll see what a word is. So angle address very simply it's a left angle bracket followed by an address specification followed by a right angle bracket. And so an address specification is either going to be a dot atom at followed by a domain or a quoted string at followed by a domain. So this is actually including, so you can see we've included that act sign here in this definition. And so a domain is just a dot atom and a word is either an atom or a quoted string. So an atom you just think of as letters, numbers. A quoted string is a string named double quotes. Questions on this ground? Oh, I'm sorry. What is an atom actually? I have to go look at it. Maybe we can do that afterwards. But it's basically kind of like their identifier that we've been using. So it's basically separating it from the quoted string. What's the benefit for how to display names? Yeah. It's a good question. I can't remember because I started with their original big one and then I kind of took stuff out. They included it for some reason because it was playing lists and it could be made in a different form or, I don't know, comma separated or something. So yeah, this is, I think we've gone over this a little bit before, right, where these are maybe not the most efficient direct representation of this grammar. Yeah. This one, dot atom, which one? So domain is a dotted atom. It just means an atom is a dot somewhere inside that atom. So a couple examples would be I'm going to have to go back and see I think. So we basically want to match all of these examples. So, for instance, the left side is like a quoted string at. So that's what this quote CSE space 340 quote at sign. That would be a quoted string at and the same with, so this last example here is an example of where you have this word list, right, so here's a word followed by a word and a word is either a quoted string or an atom. So here's an atom, a quoted string and then a left angle bracket and an email address and then a right angle bracket. So yeah, that's where we got all that there. So a word is either an atom or a quoted string. So in that last example we had this display name was two words, one was an atom and it was a quoted string followed by an angle address where an angle address was a left bracket, an address specification and a right bracket. I've actually gotten rid of all of the white space here because that was another thing that was in there, an actual grammar and they also have, they have to have special cases for unicode characters too so theirs was very complicated. More questions on this grammar? So if this shows up on tests it's like calculate first and follow sets of this grammar. Everyone will be able to do that, right? I hope so because we're going to go over it today so we can't do it on this exact grammar. That's not good. Okay, so now the first step we want to do right, we want to write a predictive recursive to set parser. The first thing we need to do is we need to show that this grammar actually supports a predictive parser and so to do that we need to calculate first and follow sets. So we're going to put the, so I'm doing this a little bit differently so I put the grammar rules up here on the upper left and we're going to kind of go through this one by one and I'm going to show the exact rules so this should be hopefully more of a refresher of how to do first sets. We can kind of do this as a group. Okay, so we want to, we start right with the initial first sets of every single set to be the empty set. That's all. And we're going to keep doing our rules, calculating first sets until these first sets don't change. So we want to calculate the first set of address, right? It's a starting non-terminal in our grammar. So what's the first set of the address? Where do we look first? Calculating first sets. So we want to read the name. Which one of these rules do we look at when we want to calculate these production rules? Yeah, the first one, right? So when we're calculating first sets we want to look at the non-terminal we're interested in here address or look at what does address produce so where address is on the left-hand side. So we're going to look at the first rule here. And then we're going to say okay, well what first set rules apply? So number three I think, so number one is if X is a non-terminal the first set of X is a terminal is the second set of X and the second one is the first set of X where X is epsilon it's the second set of epsilon the third one would be take the first symbol on each of the production rules on the production rules the left-most symbol add its first set to address's first set, right? So we look here we say okay there's two rules here address produces name address address specification so we take the first set of name address and then we're going to add it minus epsilon to the first set of address so we look at our table we see the first set of name address is empty set so we take epsilon from it add it to the first set of address and then we say well can we move on okay we've gone all the way through this does rule five apply, right? is there an epsilon in every in the first set of every symbol on the right-hand side no because we know name address an empty set there's no epsilon in there so we're done with that and by the same reasoning address specification is the first symbol on the right-hand side so we're going to add its first set to address's first set and we look at address specification and if the empty set we take out epsilon we add it there so nothing changed we went through we plugged all the rules we added the first set today but there's nothing there yet so we're going to add okay name address we're going to do the same thing we're going to look at which rule here you can say it two counting from the top one, two we're counting the first set to care about the rule of the non-termal average today is on the left-hand side we see name address has two production rules here so we take the first set of display name and add it to the subtract epsilon from it add it to the first set of name address we say display name has nothing in it and then we say do we go on to the next symbol yes? yes why yes I guess I should say do we go on to angle address by applying rule 4 no there's no epsilon in this way so then we go on to the next rule and so we add the first set of angle address to the first set of name address it's nothing so we add nothing but we still do it anyway so we make sure that we know the rule okay now we look at the display name list and here once again we have two rules so we're going to look at here one, two, three, four got a question? okay cool so with the rule of the big black we can set this up so we're going to add the first set of word to the first set of display name list there's nothing in the first set of word then we say do we move on to the next symbol no there's no epsilon in the first set of word and then we look at the second rule and we say display name list goes to epsilon now here this is very mechanically the very first time and I won't hopefully not do this again so when we look at this we say okay rule two applies there's a first symbol here take the first set of that symbol add it to the first set of display name list so it's the first set of epsilon the second set of epsilon we're applying rule two so technically we remove epsilon from that first set so we get the empty set and then we add the empty set into the first of display name list then we say well does rule four apply no because there's no other symbol after this right we've reached the end of the symbol and then we say we see if we can apply rule five we say in all the symbols on this right hand side rule do they have epsilon in their first set yes right it's the first set here it's epsilon first set of epsilon is epsilon that's when we actually add epsilon to the first set of display name list so this is just to kind of verify that the rules make sense there's no special case here for the fact that there's an epsilon here so when you're writing programs it doesn't matter it's going to follow that same same steps same rules and sequence but you as a human you can see okay well it's clearly it's going to epsilon that means there has to be an epsilon in the first set of display name list okay it doesn't really matter right now since all of our sets are empty for sets or the follow sets I think you mentioned last week that order matters even though technically in set's order it doesn't matter so in my examples and in my slides the order inside the sets will be within the order that we add the elements in and that's like on our homework or our project on the homework or the project well on your homework doesn't matter in general order of sets doesn't matter but for grading purposes for project 3 output does matter because it's specified very particularly in the description does that make sense so they're being ordered every time you sort it in a certain way so do you say we do not add epsilon with the rule 2 correct because rule 2 right is you take the first set of that first symbol and subtract epsilon from it then add that result to the first set of it yeah that's and I hope that didn't confuse anyone because that's just mechanically applying these rules and saying oh yeah if you apply rules 2 and then 5 here you add epsilon to the first set of display name lists so there's nothing special about really epsilon so after we've done this then we've calculated that display name list the first set of display name list is the second containing epsilon and then we look at angle address so here we're going to look at 5 5 so we look at display name list we're going to look at angle address we're going to apply rule 3 so we're going to add the first leftmost symbol's first set minus epsilon to angle address's first set and so that symbol here is the less than bracket or the less than symbol the angle bracket so we're going to add that with the first set of the angle bracket the angle bracket we're going to take epsilon from there and add it and then we're going to ask does rule 4 apply, can we move on to address specification no because there's no epsilon in the first set of the left bracket and so we say ok we're done so now we know that the left angle bracket is in the first set of the angle address non-term then we look at address specification and we say ok so there's two rules here so we're going to look at this production rule we're going to say ok let's apply rule 3 let's take whatever the leftmost symbol is add that first set minus epsilon to the first set of the address specification so we look here and say ok what's the first set of dot add an act it's a terminal so it's the second dot add an act so we take epsilon from that add it to address specification and we say do we go on to domain no because there's no epsilon in there there's no epsilon in this first set and so then we say we're done with this rule and we go on to the next one so we add the leftmost symbol here we add that first set to the first set of address specification and then once again we say do we move on to domain by trying to apply rule 4 no we're done there so the first set of address specification is dot of the add an act and quoted string add and we look at domain so domain should be very easy right so apply rule 3 dA dA is a terminal so we just add the second containing dA dot of the act and then for word word is also very similar so we look at addem we look at quoted string we add it ok we've gone through it once then we ask ourselves do we stop no no why things have changed things have changed yes ok so then we start again at the top and we look at address we say ok well the first set of address is for each of these rules I'm going to add the leftmost symbol right so here it's name address and I have the first set of name address minus epsilon to the first set of address the first set of name address so that doesn't add anything why is domain to dA rule 3 rather than rule 1 so rule 3 always says so rules 1 and 2 define the first set of a terminal rule 1 says the first set of a terminal is the set containing that terminal that's why rule 2 is the same thing with that we'll state that address to the name address and the whole name address and epsilon was the element of the first set of name address we still want to get rid of that what do you mean get rid subtracted first set yes you always do that every time right when I'm applying rule 3 rule 3 and 4 you always do that and then you say I've gone through all of the symbols and all of them have epsilon in that first set then I add epsilon exactly so yes if there was an epsilon in name address then we'd look here and we'd say okay it's all of the symbols on the right hand side have epsilon in their first set therefore now we can add epsilon to the first set of address okay so looking at address we look at name address name address doesn't have anything we look at address specification so we add dAA dSA minus epsilon to the first set of address then we say okay there's no more symbols after this so rule 4 can't apply and then we say there's a little 5 apply no there's no epsilon in the first set of address specification therefore we know we can't apply rule 5 and so we add dAA qsa then we look at name address and we're going to add display name, well display name is the empty set so now I'm going to go a little faster the name address we're going to do angle address here's the left bracket so we're going to add that so in order to get the set containing the left angle bracket for display name we're going to look at we're going to add the first set of word minus epsilon so word is addem and quote and string so we're going to add that we're going to see that none of the other rules apply we got that there okay so then with display name list so we already have epsilon we kind of already carried that over and then we have the first set of word so the first set of word is addem quote and string and so we add that in and so we're going to get epsilon addem and quote and string and that's going to be our set there angle address is not going to change it's kind of easy to look at that and verify yourself you can still go through all the calculations address specification is also not going to change and don't add the name of domain and word right which kind of makes sense if you look at it because each of the rules has a terminal it's not the most symbol so there's no way we're going to add epsilon or anything else questions on that one sir are we done? okay so let me do this one more time we look at address and we say add name address and address specification so name address has the first set is the left bracket so we're going to have to add the left bracket to this when we look at address specification it has QAA QSA we're going to add effectively add the left angle bracket here then for name address we're going to say add display name display name is addem and quote and string and we're going to add an angle address angle address is here's left one so we're going to add addem and quote and string here display name doesn't change the word didn't change display name list is going to be the same address specification okay are we done? no we have to do it one more time it's very easy just keep plugging chugging using the same rules life over and over again okay so now we're looking at address and we say name address is going to work so we add the first set of name address which is left bracket less than symbol addem and quote and string we're going to add that to address so that does actually change address we're going to go through and hopefully see that none of these change once I made a mistake which case someone here should let me know and we say do we have to stop no we have to do it one more time because we still haven't made a change to address so we went through it one more time and hopefully nothing changes because I run out of columns in my table so good and so we've gone all the way through we've applied all the rules and you can see the sets into the columns here are exactly the same so now we have no more we don't need to calculate the first sets anymore so these are the first sets of all the strings of all the non-termals in our grammar questions on that? you said for address that we got 2k8 because of rule 3 for address that yes that's a production rule so I guess I should clarify so 1, 2, 3, 4, 5, 6 so we're looking at production rule 6 we're applying first set rule 3 so first set rule 3 says take the leftmost symbol and whatever the first set of the leftmost symbol is take that minus epsilon from that and add it to the first set of address specification and then we say well it's the first set of DAA or dotted add a path and then the first rule tells us that the first set of a terminal is the second pane of a terminal so we apply rule 1 to get that questions? so we calculate the first sets what do we call that? non-termals are other case and terminals are lower case so if it's in here it's a terminal and we've done the left-hand side here it's a non-termal more questions? first sets? if I read the homework correctly that's going to so for name matter today instead of the display name it would have display name list if that was true then we would have the first set of display name list and angle address and okay yes you would have the first name so if this was instead of display name list you would have if it was exactly like this then there would be an epsilon in display name list which means that you have the first set of display name list minus epsilon to the first set of name address and then you would move on and add the first set of angle address to the first set of name address but I think that would also be a problem okay just keep following the rules so yeah it's essentially what this list the way this display name this display name list is written that there is at least one word in this list so let's go on to the follow sets so let's volunteer so this will take less time okay so here we have our rules in the upper left and then we have our first sets in the upper right and we're going to use these first sets to remember to calculate the follow sets so we start off all of our follow sets in the initial state let's say the empty set right now we initialize them all in the empty set and so now where are we looking? so we want to calculate the follow set of address so what rules are we going to look at in general the starting point okay let's go more tenor way so we have a non-terminal that we're interested in calculating the follow set for how do you know what production rules to look at wherever it appears on the right hand side so that's the key difference in follow sets wherever it appears on the right hand side exactly so that's what we're going to look at so in the case of address right it doesn't appear in any of the right hand sides so you can look through here no address here okay but rule one of follow sets what does that rule say also first set to follow does that know how if nothing follows that terminal then it's set to follow the non-term most so if it's the starting non-terminal then you add the end of file to its follow set so you still could have address somewhere on the right hand side and that rule would still apply yeah how does it determine what the starting non-terminal is I think it will be told so it's either S which is the standard for starting if it's one of our abstract examples but here in this example if this was on the example it would say address is the starting non-terminal which makes sense so that's what we care about we want to parse an address I guess technically this is part of the larger elementary grammars it's about lists of addresses too okay so yeah so we apply the very first rule of follow sets that says the starting non-terminal has the end of file in its follow set so we add that you see it's nowhere on the right hand side so we're done so for name address so where is name address used in the first rule okay so in the first production rule so then we say well is so we see if the second rule applies so is name address at the end of a production rule yes yeah it's at the right most element of rule one here so then we add the follow set of address to the follow set of name address and we add the end of file to the starting the follow set of the name address and there's no elements that come after this in this rule so none of the other rules apply and then we say okay well name address isn't anywhere else so that means that the name address follow set is the end of file okay display name where are we looking for display name 2 2 so we look at this okay is so we ask is follow set rule 2 apply is it at the right most symbol of this rule then we ask we ask the rule 3 applies from display name to the end of this rule are all of those symbols do they have epsilon in their first set no because the angle address here does not have epsilon in its first set so that rule doesn't apply so by rule 4 we say add the first set minus epsilon of the symbol right after it to its follow set so we look at the next one and we say okay well let's add the angle address is first set minus epsilon to display name's follow set so we look and we see okay it's the less than symbol so we add that to the follow set of display name cool and we don't have any more symbols there so there's nothing else to possibly go on to so the last rule does not apply and then we say okay well display name doesn't exist anywhere else so we're done there and so the follow set of display name is the less than symbol okay display name list where is that used 3 and 4 right so we look at all the right hand sides we say okay end rule is 3 and 4 then we say okay we're going to apply rule 2 is display name list the last element on this production rule yes so first what display name then we have the first set of display name to the first set of display name list the first set where an add back the call set of display name list and then we say okay there's no symbols after this rules 4 and 5 don't apply. And then we go to the next occurrence of this planning list. Here the fourth production rule. And so we say, is it the last element here? Yes. Yes. So we have the follow-set of this planning list and the follow-set of this planning list right into itself if it doesn't do anything. And then we say there's no symbols after this, so we're all done with our follow rules. So the follow-set of this planning list is going to be the less than symbol. Okay. Angle address. Angle address. Where is it used? Rule 2. Rule 1. Rule 2. Just rule 2. Okay. Something that's kind of tricky is you got to make sure you match those up right. Okay. So we say, does rule 2 apply? Is it the last symbol in this production rule? Yes. So then we have the follow-set of name address to the follow-set of angle address. So we're going to add the end of file to the follow-set of angle address. And that's the only place that it is. There's no other symbols, so we're done there. All right. Address specification. How many places, where is address specification? Rule 1. Rule 1. Rule 5. Rule 5. I should have numbered it easy. Okay. Rule 1. Rule 5. So we look at rule 1 and we say, is it the last element of this list? Yes. Yes. So then we add what? The follow-set of what? Address. Address to the follow-set of address specifications. We're going to add the end of file. There's no other symbols here, so we can't apply any other rules. Then we look here at rule 5, we said. So we look at address specification here. Let me say, okay, does rule 2 apply? Is it the last symbol in this production rule? No. No. Then we ask, does rule 3 apply from this symbol to the end of the rule? Is an epsilon in all of the first sets there? No. No, why? It's the first set of angle brackets. Yes. Okay, good. Why don't you say it aloud for a second? The first set of angle brackets is the angle bracket. Yeah, the first set of angle brackets is the angle bracket. So there's no epsilon there. So rule 3 doesn't apply. Then we say, okay, let's apply rule 4. So add the next symbol's first set minus epsilon to the follow-set of address specification. So here we just look at the next symbol, and it doesn't matter that it's a terminal or non-terminal. We just add whatever the next symbol is. Here it's the greater-than symbol. So we add the greater-than symbol to the follow-set of address specification. So here we have end of file and greater-than symbol. And then there's no epsilon in that symbol, so rule 5 doesn't apply. And so we're done there. Okay, domain. So where is domain used? That's rule 6. Rule 6 in two places. So we look here. We'll take the first one, this one. We say, does rule 2 apply to the last symbol? Yes. So we add the follow-set of address specification to the follow-set of domain. A follow-set of address specification is end of file and greater-than symbol. So we add that. Then we look at the next rule. We can hopefully see that it's essentially the same situation, right? We have domain is the last symbol in this production rule, and so we add the follow-set of address specification to the follow-set of domain. So it's going to be the same thing there. All right, then with words, we have to figure out all the places where word is used. So is it in 3 and 4? Here and here, anywhere else? I don't think so. Okay. So then we ask, let's take this first one, production rule 3. So we ask, does rule 2 apply? Is it the last right-most element here? No. The symbol here. No. But then we ask, rule 3, from this symbol to the end of the production rule, are there, is there epsilon in all the first sets of those symbols? Yes. So there's one symbol, right? It's display name list. When we look, is epsilon in the first set of display name list, we say, yes, it is. So we can apply rule 3 where we add the first set of, thank you. We have the follow-set of the display name list, the follow-set of word. So we apply rule 3. Then we say, okay, there's rule, let's apply rule 4. What's the next thing right after display, right after word? It's display name list, right? So we're going to add the first set minus epsilon of display name list to the follow-set of word. So the first set of display name list is epsilon added in the quoted string. We take epsilon out, we add those to the first, to the follow-set of display name of word. Okay. Now we say, are we done? No. No. So we have to leave this for rule 4. So we take, so then we ask, does rule 2 apply? No, right? Because it's not at the end. We ask, does rule 3 apply? Are there epsilons in all the first sets from this non-terminal, from this word to the end of the rule? Yes. Yes, there's a display name list, and we already saw a display name list have an epsilon in its first set. So then we can add the follow-set of the display name list to the follow-set of word. So we add the follow-set of the display name list, which is the left angle bracket. And then we say, okay, there's rule 4 apply. So yes, we add the first set of display name lists to the follow-set of word. We've already done that. So that won't change or calculate. We do that anyways. And then we say, okay, great. No. Then we say, okay, so the follow-set of word is, word is going to be followed by either an atom, a quoted string, or a less than. So questions on that? Yes, are we done? No. No. It's calculated again. Someone go through and do this. And we'd say, wow, we got the same thing. Great. So does that mean we're done? Yes. Yes, I'm an awkward. Okay, questions on the follow-sets? So what are we doing on these calculations more, right? What are we trying to show? Yeah, there's a predictive parser, specifically we care about a predictive recursive descent parser. So to use that, we have our rules. We have our first sets, and we have our follow-sets. So what are the two rules for deciding if a grammar allows a predictive parser? So I'm going to quickly go back to that. Yeah. What's the first one? No ambiguity. Sorry. If you have a, when you calculate the first of a particular string, it shouldn't intersect with the first of another one. Not that the terminus should be the same thing. It shouldn't be union. Right. So the way, so yeah, there's two ways to think about it. I think it's like the high level you're trying to go for is no ambiguity between rules. So if you have, in general, what it says is if you have some rule as a goes to alpha and a goes to beta, then it better be that the intersection of the first sets of alpha and beta is the empty set, right? So if you look here at this top rule, that would be that the first set of name address and the first set of address specification better be the intersection of the empty set. Because if there's any overlap, then we won't know which rule to apply. We'll have to just guess and maybe backtrack. But we want to be predictive. We want to predict what the next, what rule to take just by looking at the first symbol on the alpha. So that's the first rule. You remember what the second rule is? If there's an epsilon in the first of A, then the first of A cannot intersect with the following A. Right, so that is if an element can go to epsilon and reduces nothing, then you better, it's once again about ambiguity, you better be able to figure out do you parse A again or do you go to nothing? And so the test there is the first set of A intersect with the follow set of A, there better be a difference. You better know if something else is following you or if you're going to call yourself again. I think we'll see this in a second. First thing, name address. We've got to look at which rules here could possibly be ambiguous. Which non-terminals we have to check for the first rule. Display name list, right, because it's got two rules here. Address. Name address. Address specification. And word, yeah, exactly. Okay, so the first set of name address intersected with the first set of address specification. What is it? So what's the first set of name address? Less than a simple add of a quoted string. What's the first set of address specification? Dot add that and a quoted string add. Is there any intersection between the two? No, so it's the empty set. Okay, then we go to the next one. Name address. So there's one thing here, a detail I've got to point out. Because I know this is going to trip some people up. So remembering the definition of that rule, we said A goes to alpha and A goes to beta. So we make sure that the first set of alpha does not intersect with the first set of beta. So we don't care about the first symbol, we care about the entire rule here. So we put the entire production rule. So remember alpha was a string, a sequence of terminals and non-termals. So here you put the entire right-hand side of each rule here. So we're saying the first set of display name followed by angle address intersects with the first name of the angle address. So what's the first set of display name and angle address? Add of a quoted string. Add of a quoted string and... Add of a quoted string? A quoted string just because there isn't an empty string, so you can't do it with an angle address. So you calculate this just like you calculate the first sets of anything, right? So we look at this here and we say, okay, the first set of display name, well, we know the first set of display name is add of a quoted string. And then we say, well, can we move on to the next string, to the next symbol? Is there an epsilon in this first set? And there's no, so we can't move on. Cool. Okay, so the first set of these two is also just the first set of display name. The first set of angle address is just the left bracket, so you can see there's no intersection there. So that passes. What about this first set of word display name list? Intersect with the first set. So what's the first set of epsilon? Second containing epsilon. What's the first set of word display name list? Add of a quoted string. Add of a quoted string, right? So there's no epsilon here, so it really doesn't matter what the display name list is. There's no intersection there, so we pass there. Then we look at this last one, the first set of dotted atom act domain, meaning with intersect with the first set of quoted string act. Is there an intersection there? Nope. So what's this first set? Dot of the atom act, and the first set of this one? Quoted string act. Alright, then we look at the last rule, the first set of atom, intersect with the first set of quoted string. No intersection there? Cool, so we passed all of our rules for the first test. We have two tests. The second test is, is there an epsilon in any of our first sets? Which one? Display name list. There's an epsilon here. So it better be the case that the first set of display name list intersected with the follow set of display name list is the empty set. So is that the case? Here we can just look it up. So the first set of display name list is epsilon atom quoted string, and the follow set of display name list is the left angle bracket. And so we see there's no interception there, so we're good, so now we can actually write the predictive parser here. There we have the questions. So maybe you can see here that if we're trying to parse the display name list, we can look at, peek ahead at the next token and say what's the next token? Well, if it's in the first set of display name list, if it's an atom or a quoted string, then I know that it's, if it's an atom or a quoted string, that means I know I'm choosing this word display name list again. But if it's in the follow set of display name list, if it's a left angle bracket, that means I'm done parsing my display name list. So I've parsed my display name list, I can't parse anymore, because I know that that next symbol is an angle bracket, so I'm going to parse and match with the other one. Reference. We've calculated all this. Now we can actually turn this, all of this, so this is actually all the information we need to grammar the first sets and the follow sets to write a parser for this language. So that's what we're going to do right now. And so, kind of on each side, we're going to take it step by step. We're going to show the important transition rules here, so we're going to start at the top, we're going to start with the rule, with the address rule, and we're going to keep track of all relevant, kind of on the top will be all relevant rules in the first set. So here we have the first, we have the rule address goes to name address or address specification, we have some relevant first sets and some relevant follow sets. So, just to kind of establish some standards that we're going to use in this class, and we write our parsers and you're going to write parsers for exams, the way we're going to do this is we're going to find functions as parse underscore non-terminal, where in this case we're going to write the function parse underscore address. So this function is responsible for parsing the address. So the very first thing we do is we want to get a token, right? So we want to look at the token and see what it is. So how are we going to be able to tell which of these two rules applies? If we show the production rule and I don't do that, I'm not going to be able to tell you. Yeah, so we're going to look at our, so we have two things here. We have the first set of name address and we have the first set of address specification. So we can see, so there's a couple of different things we can do, right? So A, we can look at the first set of address, we can know, hey, if it's not one of these symbols, then that's a parsing error. If it's a greater than symbol, then I know that this is not a valid address, right? Because it's not in the first set of address. When I'm parsing, better either start with a dot atom at, a quoted string at, a less than symbol, an atom, or a quoted string. And then, so that's one thing we can do. The next thing we can do is we want to say, okay, which production rule do we want to call? So we check this token that we just read in with each of the first sets of those production rules. And remember, we've already proved that there is no intersection there. So we'll be able to tell exactly which rule to choose based on this token that we're reading. So the first thing we're going to do, let's take the first one, right? We're going to check if that token is in the first set of name address. Right? So what's in the first set of name address? Let's say it's a less than symbol, a quoted string, right? I really hope that everyone can turn this into code that looks very similar to this, right? We're checking if the T type is up here. I'm just representing the non-terminal. This isn't, let's say, it's not valid C code, but it's fine pseudo code for our purposes. So I'm checking, is this token type is it a less than symbol? Is it an atom? Or is it a quoted string? Okay, so, but if it is, so next I will want to check for the first set of address specifications, but we want to fill out, so what goes inside of this if statement? So what does it mean when this condition is true? We know what? It means you know what rule to pick on. Right, we know which rule to pick, and we know we picked address goes to name address, right? If we're not parsing an address specification, we know we're parsing a name address. But, huh? Yeah, so parse name address when it gets there, but first, what have we done to get token? Right, we're taking it from the input string. So this means when, if we called parse name address, this is what we want to call, right? We want to call parse name address, but when parse name address gets there, it's going to want to read that token, but it's going to be misaligned because we've already read that token. So the very first thing we want to do is we want to un-get that token, right? Put that token back so that name address starts from the start of the string. And then we're going to call parse name address, right? And then that's going to do something, and then kind of like convention, this is the way I want to go is I'm going to represent we do something with this parsing as printing out the production rule that we used. So here I want to print out address goes to name address. So this is what you should do also in your homeworks to specify that yes, I know that this is the production rule that we just parsed. And so parse name address is going to do something, whatever, we don't care. And so that's in our turn. So, but what if the t-type is not the lesson symbol in Adam or a quoted string? Then what do we check for? The first set of address specification, right? Which is what in string app? Yeah, exactly. So we check if the t-type is not about an act or a quoted string app, then what do we do? We're going to call unget token again, because remember we don't actually want to consume this token, we just used it to determine which of the rules to parse. And then we're going to call parse address specification. And then we're going to print out, we're going to do whatever is the result of our parsing and see if we're just going to print out that we did this rule. So what if it didn't match one of these five? So we can check for five tokens, right? Parsing area. Parsing area, yeah. How would we know if that end of file was valid? Yeah. Yeah, address went to epsilon, exactly. So if address went to epsilon and could be empty, then we could check for, then we would check for the end of file. And now we have valid parsing. But because it's not, right, there is no epsilon in the first set of address in these that we've got to be able to parse something. So, in this else statement, so we're going to use, this is the other convention we're going to use, is syntax error to function syntax error. You can imagine if something you call that prints out, hey there's an error in your syntax, you can make it as a compiler to do what your program is about. Questions about this? Okay, we're going through all of these because there are tricky cases. So now we're going to look at name address. So we're going to actually describe the function of parse name address. So here we have the first sets, follow sets and we're going to want to do is parse the name address. Right here we have two rules to choose from. Display name, angle address or angle address. And so what's the first thing we're going to do? We're going to get the token, right? We're predictive. We want to parse and see if we want to peek one ahead, but we only ever want to look one ahead. And we're going to look and determine which rules we'll watch. Okay, so which is, so let's say I want to take the left most rule. What am I going to check for? Who's first set? I'm going to list the quoted thing. What is it? Yeah, I'm going to list the quoted thing. Where does it come from? The first set of what? Which one? First of the display name. Of which? Of the display name. Of display name. Almost. Yeah. First of the display name, angle address. The first of the display name, angle address. Yeah, so that's, it's functionally the same, but if there was an epsilon in display name, right, then it wouldn't be the same. So you want to make sure that you're taking the first set of that whole production rule. So display name, angle address. And so we're going to check Adam quoted string. And the very first thing we're going to do? Unget the token. Exactly. Put it back in the input string. And so now, what are we going to call? What do we want to parse? Put it on a deck. Parse display name. Good. Parse with one? Parse angle address. Yes. Okay, so this is where as you can see, we basically put these production rules. So we know we want to first parse a display name and then we want to parse an angle address. Right, so that's why these calls follow each other. So parse display name, we're going to see it's going to go parse, do whatever it needs to do. And then when it comes back, the very next thing we better parse, better be an angle address. Because we know based on looking at the first sets, that this is the rule that we're following, right? We're following this rule. Name address goes to display name and angle address. Does everybody see what there's two calls to two different parse functions? Do we see where those parse functions came from? Any other questions? Yeah. We're not what? We're not checking for the rest and symbol is a key type of condition. We're not checking for the less than symbols. Why are we checking for the less than symbol? Parse angle address that we because of the quality of it. Yes, so the inside parse angle address that will take care of looking for the less than symbol. Exactly. Because in here we only care about we're going to parse the display name and then parse an angle address. And this came directly from this rule. The rule is display name followed by an angle address. If your other question was why we're not checking for here, the less than symbol is because the first set of display name and angle address is the second thing Adam had to say. When you're printing the rules that you choose in the end and then it recursively calls parse display name and angle address and those print the rules that they choose. Wouldn't the rules be printed in weird order? Yeah, it's like a post-order. It's like a post-order reversal. I have to think of it as that's when so it's not until you come back from an angle address that you know the display name and angle address were properly parsed. Because if you do something before that, well you don't know if the angle address is proper and all that. That's what angle address is going to parse. Either of those two functions could call a syntax error. So that's why you wait until they return to actually do something. But yeah, when they print out they'll print out and have a weird order. Any other questions? Ok, so what are we going to check for the tokens for the second production rule? We're going to check if the t-type is equal to less than symbol. From where? Where'd you get that? From angle address. Ok, so from the first set of angle address, right, that's the next production rule. So we check if the t-type is the less than symbol. Then we're going to do what? Unget token. Unget token? Yeah. And then what are we going to parse? Angle address. And then we're going to print out a rule out that says we parsed angle address. So here, even though we have two calls to an angle address. To parse angle address, right? But they're in different branches because they're part of different production rules. So in one production rule we first have to have a display name followed by an angle address. And in the other production rule we have an angle address by itself. And that's how we know if we look at the first token it better started the less than symbol. And then we know it's this production rule that it's an angle address that we're parsing. Ok, so if it doesn't matter, yeah. Right after a parsed disability we parse angle address what happens if it doesn't find it? What do you mean if it doesn't find it? Isn't there supposed to be like a should like a statement saying if this, then... So, I guess you could. I think the convention is at each of your parses you're checking for that the token is what you want. So everything's going to check. So this is why our next clause right is going to check for a syntax error. So if it's not an atom, a quoted string or a less than symbol that's going to be a syntax error. So yeah, I mean I guess you could check for the first set of angle address right there, but at that kind of level you don't need to do it. But you do need to be careful. There are cases where you need to be careful which I will get into. So we're going to get into that because we may want our rule may want to consume that token depending on what the production rule is. So we're going to get to an example one of the examples has that. It depends on what you're calling next. Because in this case name address goes to display name angle address. There's no tokens in between name address and display name. So display name is going to parse it's things and then angle address is going to parse it's stuff. You could do it with like instead of this get token and unget token but this is kind of the model we're standardizing around since everyone used it with their model. Okay, so then we look at display name. So display name here doesn't have any alternative. So it's display name goes to word display name list. Here's the relevant information. So we're going to write the function parse display name. So we're first going to do what? Get the token, right? Get the token because we wanted to say well is this a proper display name? So we first get the token and then we're going to check the first set of in this case word display name list. And so what's in the first set of word display name list? Add on the floating string. Exactly. So we're going to check that and we're going to unget our token and what are we going to parse here? Word and parse display name list. This, exactly. And we're going to print out that we did that. What are we going to do if it's not an atom or a floating string? Syntax error, right? Error. Even though it's only one production rule we know if it's not an atom or a floating string then this is an error. So we have a syntax error. Okay, let's see if I can, we'll get to okay, this is a good example. Okay, so display name list goes to word display name list or epsilon. We have our first and follow sets here. So we're trying to write the function parse display name list. What's the first thing we do? Get token. So we're going to check the left rule, the left state in here. So we're going to check the first set of word display name list, exactly. We're going to check that, we know it's atom and floating string, we're going to check if it is, we're going to call unget token we're going to call parse word and what are we going to call next? Just like any other ones. And then when that returns we're going to say display name list goes to word display name list. So this is where you get the recursive from, right? So we're predictive parser because we're always just looking one token ahead and we're going to predict which rule to take. And we're recursive because we're going to continue calling these rules. We're recursively calling these rules. So we've checked this, so that's one rule. But then what happens if we have an epsilon? We check, how do we know if it's like a valid, how do we know when to stop, right? Because this, we don't need a terminating condition for this recursive call. Right, so check the follow set of display name list, right? So if we go to the next tokens in our follow set, and it's not in our first set, then we know we're done. So we check the follow set of display name list, which is the less than symbol. So we say if the T type is less than symbol, then we're going to get the token, put it back and we're actually, oh yeah. So why do we have to get the token here? Because it's something else. Yeah, so it's something else. It's something that follows display name list. It's not our token, it's something else. But we know that it's a valid token that follows us because it's in the follow set. So we print that, we went to epsilon, and then finally in the else clause we say it's impact error because it's not in our first set, and if it's not in our follow set, then it's definitely a bad token. Okay, so we will continue here on Wednesday.