 All right. Good morning, everyone. Thanks for being here on this beautiful president's day at 9 AM, so we can talk about cool, fun stuff. Do we need to get in? Yeah. Get in, D. OK. So I won't take an informal poll of raising hands, but I want you to do a little self-check of, A, have I read through Project 3 description, and B, have I started on Project 3? So if the answer to both those is no, then you're definitely behind. So you should get started right after this class, or whenever you can after this class, depending on your schedule. So I'll take the first maybe 15 minutes of class today to go over Project 3, so we can talk about it so we can make sure that everybody knows what's supposed to happen, what the project is supposed to look like. And that's basically the idea here. So if you have started or you haven't read it, you can feel free to take this time to ask any questions, and we can answer any questions that come up here. OK, so high level idea of this project, you're going to write in using either C or C++, your choice, you have to write completely from scratch a program that reads in a description of a context free grammar. And depending on what arguments you pass it, it does some type of commutation or output some information about that context free grammar. So one of the things, you'll either do output the first sets or output follow sets. So you'll actually be programming the algorithm to calculate first and follow sets on an arbitrary context free grammar. Like a small step, task is 0, it's just output some information about the grammar. So this project is designed in a way that you build up from task 0, which is just make sure you're reading the input and being able to output things correctly, to task 1 and task 2, which task 1 is calculating first set, and task 2 is calculating follow sets too. So to calculate follow sets, well, you have to have first sets. Yeah, right? So that's kind of the way how I would break this problem down and approach this, right? OK, do this first task by task 0, make sure that's working 100%, then do task 1, make sure that's working 100%, and then build on that to the follow sets. Questions at a high level? High level kind of project goals? Yeah. Will you just make task 0? Are you really just typing in 0? No. Or is that, I would break that. Yeah. We'll get to that in a second, but yes. I guess literally typing in before you run the command? Yes. But not typing in as input to the program. We'll see. Any questions? Yeah? What kind of data structures do you recommend for this project? That's a very good question. Let's talk about that when we get closer a bit more into the details, because yeah, that's a good question. All right. So first thing we need to do is figure out, how do we decide what tasks to run if we're doing task 0, task 1, or task 2? So this is a little snippet of C code that you are free to use that shows you how to read command line arguments that are passed into a program. What are command line arguments? Not standard ed. Can't put arguments you take in from the command line. Yeah, so that's the difference, right? So if we look at, let's see, where do I have it here? So I run some kind of thing here, because I actually don't have anything here. All right, so if I'm going to run this command, test 1, whatever, whatever, anything I pass in here, these are all arguments to the command line. So these arguments are going to be passed into the program technically by the shell bash, but the specifics, probably don't know who we're got now, but it is very cool. Do y'all see that, or should I make it bigger or change the lighting? Whereas if I run the thing like this, and see, there's nothing here, when I'm running a command like let's say cat, right? This is standard input. So I'm giving standard input to the program, and it's reading from standard input and outputting to standard out. So this is basically the two ways you can get input to a program, right? You use the command line argument, you use standard input, yeah. Can you say what cat does again? Oh, cat just outputs. By default, it reads from standard in and outputs whatever it gets to standard out. Otherwise, if you give it a file name, like if I did test 1, it would output, it would take in that file name and output that. So it concatenates several files together. So if you wanted to do several files, it'll concatenate them all together. But I'm just using it for the standard input, standard output here. Okay, so when arguments are passed to your program, how do you know that arguments are passed to your C or C++ program? So how does? Your commands are stored in the array. Yeah, so the basic idea is argc, here the first int argument to main, specifies how many arguments you have, and then argv, this is the char star star. Other brackets mean that it's an array of character star pointers, or character star, which are character pointers. So it's an array of pointers to strings, and argc tells you how many in that array there are. So, yes, okay. So, yes. Is this a question of that for a while? Sure. Is it just convention to use argc and argv as a name or can you use anything else? Yeah, the names don't matter to, it depends on your code in here. Yeah, it doesn't matter at all. Just the signature matters. You mess up the signature, you'll get past integers and char star stars, but yeah. We'll be passing multiple 012 parameters in the argv character array. No, so just one, so yeah. I think we'll see an example in a second. So the idea is argc tells us how many parameters were passed to our program. So what we do is we first check if argc is less than two, then it says that we're missing an argument. Why less than two? How many parameters did I say we were passing? Two. Two. Let me try and find it. I guess I don't have. Yeah, so we're gonna run it like this. Right, with zero saying we want to do task zero. How many arguments am I passing to a.out here? One, two, two. Two? Two, two, one. Looks like one. So yes, so technically we are only passing one argument, right? We're passing the argument zero to a.out. But the way this argv and argc works, argv zero is the name of the program being executed. So in this case it would be, sorry I should put the closer up. Yeah, in this case it would be the string a.out. Actually it could be useful to your program if your program maybe changes its execution depending on how it's being called. So argv zero is always the name of your program. And so if there is a command line argument, argv one is going to be the first argument and argv two will be the second, argv three will be the third. So this is why here argc will probably always be one if you don't pass anything in. And so this is checking if argc is less than two which means we did not pass an argument. It's gonna say hey you're missing an argument and we're gonna exit early. Like we don't wanna do anything. So the next line is, yeah, so we kind of have a note here to mention this. Are we going to be receiving, like are any of the test cases something where that's gonna be? No, okay. You don't have to worry about that. You don't have to worry about any garbage argv. And you can follow the input here, it'll actually follow really closely. Okay, so argv one is the task, right? But it's a char star, right? It's a pointer to a string or it's a pointer to a character, right? So we use the a to i with the a to i function, same for ASCII to integer. Yeah, so it takes in a char star and returns the integer that that represents. So we set that to be task if we define that here as an integer. And then we do a simple switch statement and say hey, if it's task zero, then do task zero. If it's task one, then calculate the first sense and output the first sense. And if it's case two, you don't calculate first sense, calculate follow sense, then output the follow sense. Otherwise, then output say, hey, I didn't recognize this number and break to return. So you don't have to follow this exactly, I don't really care. You know, it's just gotta work. But a thing to note, right? You should probably not do your entire first set calculations right here in the main function, right? Because you know that you're gonna have to do them later if it's case two, right? So you want to make sure that you know that you have a function of how to calculate first sense, right? Questions on reading the task number? This is part that I really like. So we're describing the input to your program with a grammar and the input to your program is a grammar, right? Is a context free grammar and specified using a context free grammar. At high level, we have different sections each separated by the hash symbol, the grammar specification, when it's all done is terminated by double hash. So this means that if there are any symbols after the double hash, they're ignored, right? You keep reading until you get to the double hash and you know you're done, you don't have to read anymore. All the symbols are whitespace separated, so this is specified in here. And so our description of the grammar in a context free grammar. So we start with the S, we have a non-terminal list followed by a rule list followed by a double hash, right? At a high level. When we say a non-terminal list is followed by an ID list, is an ID list followed by a hash, where an ID list is a series of IDs, which we specify our tokens here, IDs, hashes, double hashes, and arrows. Then a rule list is rules where each rule is an ID with an arrow with the right hand side where the right hand side is an ID list or epsilon, nothing. So putting that all together, and so we have all the tokens described here. So when, and this basically specifies exactly what we mean by whitespace, right? So that way we're all, we all know what we mean by whitespace. So it's the is space function and c type dot h, right? So if you ever want to know, hey, is this a whitespace character? You call this function, it'll return one if it's a whitespace character or zero if it's not, yeah? So for, we have to modify, or we can modify whitespace dot h and whitespace dot c for hash, double hash, and E arrow. But we don't have to do that for ID, right? Because the whitespace dot c or E does that for us. Kind of. Does it? That's what you have to absolutely make sure of. Okay. Yes. I, off the top of my head, I'm pretty sure it does do this, but like this is the same. But I don't know a hundred percent. So yeah, that's, you can definitely use Lexar dot h and Lexar dot c. And it's a good exercise, but you can't just drop it in. You have to modify it and change it. Because there's a lot more tokens in Lexar dot c than there are in this language, right? There's a token for if, there's a token for while, there's a token for all these other things. When here we only have really five tokens, a digit, the ID hash, double hash, and E arrow. So you can do that, or you can code this by hand. You can write your own Lexar here. In our, I think on the input, are there gonna be a sample input that's lower down? There are like examples where you use like pull in, I think, or comma. Are those part of ID, or are those actual values that are in the? These are the only, so these are the only tokens in our input language. These are the tokens. Plus letter and digit. Plus letter and digit, yes. But they're not ever used in here. There's like some. Exactly, yes. Okay, so, right? This describes what it means to be part of our input language, right? How do we read input? The semantics says what does it actually mean, right? What does that actually mean in our program? So the idea is we first have this list of all of the non-terminals in our language, right? So we say in this first list, which is an ID list up to a hash. Each of the IDs is a non-terminal in our input language. I believe we say here that the very first one is the starting non-terminal, right? Yeah, the first non-terminal in this list is the start symbol of the grammar. So we read in all of the terminals. So what does that mean? So next you're reading the rules. So what happens if you see something that's not in this list? The terminal, yeah. If it's on the right hand side, it's gotta be a terminal. If it's on the left hand side, then you would have an error, but I am 99% sure we have no test, like we're not. This isn't an error checking assignment. This is we give you valid input and we expect the correct output. So that on its own is difficult enough. So you gotta be prepared for that. Okay, so this is saying I have the non-terminal declaration ID list one, ID list. And so then for each of these hashes, up to the hash is a rule where I have an ID, which is the left hand side of the rule, an arrow, and then everything after the arrow is the right hand side of the symbols of the grammar. So you can easily see how this, or you can see how this could be a grammar, right? You have declaration goes to an ID list, followed by a colon, followed by a capital ID, right? So is this capital ID the token ID in our language? Yeah, this is just an actual terminal in the input language, right? It doesn't mean anything to us, it just means something to this specific language. Yeah. This ID list one here is interesting, so what does this mean? Epsilon, yeah, exactly. So is this the special case for ID list one goes to epsilon? Right, so there's nothing here. Yeah, so the first section sets all the non-terminals and the rest of the input specifies all the grammar rules. So this is how that would be represented in how we've been seeing context-free grammars, right? So it's saying like, okay, we have a declaration goes to an ID list, colon ID, ID list is ID, ID list one, yeah. Commas and challenges of the token used in the grammar descriptions, should we add it? This, how do you know that, so what do you know when you see this token, or when you read this? What's this gonna read as from the lexer? An ID, it's an ID with the name of comma, all uppercase, right? So in this language that we're reading in, what do we know about comma? The symbol, but what is it? What kind of symbol is it? The terminal, how do you know? Because it's on the right-hand side of an arrow. That isn't one of them. What about this? Is this a terminal? Is this on the right-hand side of an arrow? It's a word that's not that first list. Yes, it's an ID and it's not in this first list which means it must be a terminal. So it has, once again, it has nothing to do with our input tokens, right? Our input tokens say how do we read this, right? This is gonna read this as whatever ID list so it's gonna be ID, ID, ID, hash, ID, arrow, ID, ID, ID, hash, ID, hash, arrow, ID, ID, hash. So how does the grammar description relate to the input grammar exactly? Can't you parse that whole thing without using the grammar to define grammar? Yes, we're using this grammar to define how to read in an input grammar. Little. It's using its definition to define itself. Yes, it's supposed to make you think. You should be able to feed in basically this grammar to get the first and follow sets of this grammar. To your program, it doesn't matter at all, right? Your program just reads that in and it knows what each of these symbols means and it knows how to build up the rules of the context-free grammar, right? But the input language is never gonna change, right? Like what your lexer considers as an ID, a hash, double-hash, and an arrow is never going to change and what your grammar parses as this, right? This input language is never going to change. You showed up, if you do arrow and then a hash, that's just epsilon, you want to have epsilon again what you do, so you have epsilon into, I mean you have hash and then two hashes. So the last one, that's an epsilon, right? This, or not? No, you gotta take it back to the semantics here. This is a rule list, so a rule list ends with a hash, right, so this rule list is gonna end with a hash and then there'll be a double-hash after it. That's how you know, that double-hash is how you know that there's no more rules, exactly, because each rule is rule, which is ID, arrow, right-hand side, hash, so you have as many of those as possible until you get to a double-hash. It's like you're end of file. Yes, exactly. So even if you had another, like you had a copy of ID list one, the first one after, like a copy of the fourth line after the fifth line, you would just ignore it. You would have already ended the file, right? Yes, if there's anything after here, you don't care. Just like on project two, when you looped until the legs are returned to end of file, they just are kind of looping until you get to double-hash and you just stop. You don't have to read any more input. You don't care if there's errors or junk afterwards. You don't care if it's the end of file afterwards. If there's other stuff, you just stop reading because that's what the input specification says. The single-hashes are basically new lines, kind of. Kind of, yes, they are separating these different parts. Did you technically have this all along the line? Yes. So it says, on that example, just for readability purposes, they're separated on new lines, right? But new line is just white space, right? And white space is ignored. All the tokens are separated by white space. So they could be all on one line. They could be separated by multiple new lines. The input could definitely look like this. Great, so then we can tell from, and everything is case-sensitive, right? That's the important thing. So we know that non-terminals from our first section of our input, and then we know all of the terminals based on things that were used on the right-hand side of rules that are not known non-terminals. So we're kind of doing implicit declaration here of the non-terminals. Questions on the input? So as you can see, the spec is complicated. So you should read through it several times to make sure you fully understand in your brain how the input should be coming in, right? And then that way, that helps you code it, right? Because it's a lot easier to change your code before you've written it, right? And so if it's something that's like, hey, I don't know, case-sensitivity, right, or something like that, you wanna make sure you know all these properties of the input. Okay, just like before, send OS 6.7, everything. This word, yeah, we kind of reiterate it again. You can use, you're welcome to use lexor.c or lexor.h, but you have to modify them. Just try to drop it in. It's not gonna work. It's gonna cause problems. You're gonna be in unhappiness. Okay, yes? If we do use those, do we have to keep the names lexor.h or do we change that as long as? Yes, whatever you want. We can tell where the codes came from. Probably kind of whatever you come up with is if it matched somebody else's code. But doesn't the submission side of that need those files? No, it will compile it all just as if it was your source code. So this is, basically what this says is we're not gonna automatically add any source code. Like last time we added lexor.c and lexor.h, right? So now you're basically submitting an entire, an entire program that's gotta be written to these specifications. And it doesn't work, it doesn't work, so. Okay, so we just have to upload those two other files. Yes, yeah, that's why they can be named whatever you want. All right, so briefly going over the tasks. Task zero, the goal is you're just gonna get some information about your grammar. So the idea is you're gonna list first all of the terminals of the grammar separated by a space. In the order they appear, let's see, in the order they appear in the output grammar, in the input grammar. So this is all of the terminals, so how do we tell if things are a terminal? It's on the right hand side of an arrow and it is not in the first line of the input. Yes, okay, exactly. It's not a non-terminal, so it's a terminal and it's on the right hand side of a list. I'm assuming all of them are probably two done, this is all. Yes, yeah, just like this, so your lecture has to read from standard input, just like the other. I think it says in here that it's coming from standard input and then you'll output the standard output. Yes. Okay, so this would be the order that it appears, right? So if we look at, not that one, so that's the only one. Okay, so if we look at here, we can see, okay, ID list is a terminal. I know because it's here. Colon is not a terminal, or it's not a non-terminal because it's not here, so I know that it's a terminal, so that's the first one that appears in the list, right? Left to right, top to bottom as you read it in. ID would be the second terminal in this list, right? Because it's not in here. And then ID, I've already read, ID list two, and comma would be the third one that I've read in. So it should be if we go down. Should be colon ID comma. So the three that I read in order. Then you're gonna output for each non-terminal in the input grammar, in the order they appear in the first section, right? So another thing. It's gonna be, you're gonna output the non-terminal followed by a colon character, followed by the number of rules in which the non-terminal appears on the left-hand side. It's pretty clear to look at it though. So basically this is how many rules are there for declaration? How many rules are there for ID list one? How many rules are there for ID list two? So if we see the order, because we have decal, ID list one, ID list two, we can see that there's one rule here for ID decal, one rule for ID list two for ID list two. Not just over outputting. So the goal here is to get you to read in the input grammar, right? So you should be thinking about how do I want to represent a context-free grammar as a data structure? How do I want to represent the non-terminals, the terminals? How do I want to represent the rules as a data structure, right? So I can iterate over them, so I can compute on them. Okay, and I'll briefly go over the other ones so we can have some time to talk about the data structure. The next one is just first sets. So I'm going to basically let you read through this, but the idea is you output for each symbol of the grammar what the first set is. We specify exactly how to determine the first sets. So it's going to be if epsilon is in the set, we're going to represent it as a hash. If it is in the set, it should be listed before any other elements. Otherwise it needs to be sorted using the string comp, strcmp function from string.h. So running that on the previous program should give an output like this. The first of decal is an ID. The first might be list one is epsilon comma and the first might be list is an ID. And then the second one is follow set. So it's the same, basically output order. It's just end of file. We're representing on the dollar sign. And if end of file is in the set, it's listed first. Otherwise everything else is sorted. The output basically looks like this. Question about first and follow. So this is where the need of it is, but the output specification is pretty well defined. It's actually doing it, so it's a problem. Evaluation, so task zero is 20 points. First sets without epsilon are 30 points. First set with epsilon 20 points, follow sets with grams without epsilon 25 points, follow sets with grams with epsilon 10 points. Important point, this is how we've been doing this entire class, right? Just like a compiler, if your output doesn't exactly match the expected output, then you fail that test case. Doesn't matter that there are spaces here or not spaces here or there's an extra comma or not an extra comma. It needs to be exactly and precisely match the expected output. And then for, so you can look in the .zip file. Like I'm here. So we've given you test cases from every category in there, and these are test cases that are on the submissions server. So when you run these outputs, it basically will say test01.ztext. So we can see if this was our input language, and we ran it with the zero option. We would expect this output for first sets. These are the first sets of that grammar and the follow sets of that grammar. And there's a highly, highly, highly recommend that you use the, this test1.underscore p3.com that we got sh. So this is a shell script that will run all these test cases against your submission or your compiled program and it'll tell you exactly how many paths. The reason why this is important is this is the exact same way we're running and comparing your test cases on the server. So if you run this on sentOS67 and it says you're passing eight test cases, you should, unless you're doing something wrong, you should pass eight test cases on the, a minimum of eight test cases on the server. And that's just in the zip file? Yes. On the submission side. Yeah. So last project, I was, I spent hours trying to figure out why I couldn't pass some of the secret test cases. And one of the things is that I was, I decided to have the program exit on an error, but can you pass on an error? So I was, or need to keep going on an error. Is there like ways to have, not waste too much time on things that I didn't think was worth spending so much time on? Like that? Like is there, like test cases that just doesn't seem to be passing? Are there, are there kind of hints that would maybe point us in the direction? Yeah. I mean, so part of that is it goes back to really understanding what exactly is this asking you to do and what is it not asking you to do. Like so project two, I'd say, well it doesn't say to stop. It says to keep reading until you get end of file token. Right, so. I guess for like cases where, tuition right now, but I guess like, if you're reading a number that should be an ID or just, I guess, error proofing your code. Yes, so. Okay, a couple of tips. I think generally, right, so at a high level you can actually use this same testing infrastructure to write your own test cases. So that's something I think would be very helpful is you can add more tests into tests if you just create test, I don't know, 07.txt, right? And have your, you're expected, you write by hand, expected zero, expected one, expected two. And then you run it, or you run it and you test your program, right? Then you can make sure that how you think the program should output is actually what it outputs, right? So that can help with things like it's part of learning how you test your code, right? So, you know, what are all the error conditions? What, like okay, I'm checking if there's good input, what about if there's bad input? I mean, in this case, we're telling you, okay, don't worry too much about bad input, but yeah, what if there's a token that is a number, right? Well, in this language, there is no token num, so it shouldn't change anything, but maybe if you're using the Lexar and you never remove that part, it's gonna mess up. Or what happens if there's no input or what happens if there's long rules, right? Did you, is there anything hard coded? Did you hard code any lengths or numbers of things, right? That could be another thing that happens. So that's, generally, you know, I think always a good idea is to make your own test cases and you can take the, you know, the grammars we've been using in class or something and make test cases out of those, especially for person follow, if you make a grammar with kind of like a loop, maybe, or what looks like a loop, you know, to make sure your program doesn't crash on that. Those are all good techniques. What was the other part of your question? You were looking for, oh, oh. Yeah, I guess, I guess in that situation where I even made my own test cases and I was like, still not fine, where I'm, Well, that's how you learn, right? Right, no. Which doesn't help, doesn't help you in that scenario. Built character. I feel like a dad when I say that. Okay, so one thing, a very good technique that I actually really like using is to kind of what they say, code defensively. So, for instance, using assert statements to assert things that you know should be true in your program, right? So like, if you're writing a function to calculate person follow sense, right? And you are using a string for the left-hand side rule or something and you are asserting that, hey, this left-hand side rule should never be null, right? One of the worst parts about an error is if it never manifests itself, right? It just continues to work incorrectly. With an assert statement, an assert statement basically runs whatever's in this expression and will kill the program and output an error if this is ever false. So you can write statements in your code that say, hey, I expect this pointer to never be null. And I expect this data structure, the length of this data structure to always be greater than one. And if that's ever not the case, it'll stop and it'll tell you, right? Rather than just bailing incorrectly. So this is kind of about putting your assumptions directly in the code, like here you can, yeah. So this is asserting that this is never gonna be null. So this is kind of a technique called essentially defensive programming, right? So you put in all your assumptions and then you tell the program, hey, if these assumptions are ever not true, great. Because I don't know what should happen then. And that's kind of a high level. And I think the other way to think about it is what's always helpful to me, I don't know what your guys' approach is, but for me, not necessarily writing out the code but writing out the steps of the algorithm and maybe the data structures. It's a lot easier to change something. It's on like pencil and paper rather than in your editor and it has to compile and all that stuff. So walking through your code step by step and being like, okay, what's actually happening here? Why is it going wrong? Let's kind of talk at a high level about what kind of things do we wanna read in in this project? What's our goal? Like always token, do you wanna read a minute? Kind of grouped in certain things with the right hand side. Right, so at a high level, what's the high level? What are we trying to read in? Sequence of tokens. Higher, higher level. Sequence of character. No, higher, that's lower. Higher, group of things we have. Context-free grammar, right? Yeah, at the high level, right? More abstractly. You're going too deep. Good. Yeah, so at a high level, we want a context-free grammar. This is kind of the way I approach these problems is thinking about it in terms of data structures. Right, so it's like, if I'm reading in a context-free grammar, well, I probably want some kind of data structure to contain a context-free grammar, right? I want some code representation of what is a, what do I think a context-free grammar is? But that's kind of hard, there's a lot of parts here. Right, I mean, so, like what kind of things are in a context-free grammar? And let's, I'll get rid of this because I don't want to talk about any code stuff. What are parts of a context-free grammar? The non-terminals. The non-terminals? On the left. So that's a specific rule, right? High level, what is a context-free grammar? Yeah, the non-terminals, right? What else? Terminals. What else? Epsilon? I don't know if we can kind of put that in the terminals, but yeah, it's good to think about all these things. What else, what else makes a context-free grammar? More or less. What was that? More. Or in what sense? I guess in different paths. So we can have, so that kind of shows us that there can be multiple rules, right? We can have multiple rules of a left-hand side rule. Yeah, what are you speaking of? The arrows, the thing. Arrows, a higher level, like abstractly, right? We know that there are these symbols, right? We know that there are these symbols, but at a high level, they don't mean anything besides telling the program, hey, this is the arrow separates the left-hand side from the right-hand side, right? And the or means that there are separate rules, right? But I'm kind of like, what really constitutes? So are all of these non-terminals exactly the same, according to our context-free grammar? No, no. Is one more special than the others? The starting one. Yeah, so we need to know which one's the starting, right? Why do we need to know that? It is where you start, but why do we want to do? Why does it matter? The first one is wrong. Yes, that's how we're going to know it's the starting non-terminal, but why is that important, right? So we want to think in two different ways. We want to think about what is a context-free grammar and what am I using this data structure for, right? Because if I never am going to need the starting non-terminal, that doesn't make sense to reason, yeah. What do you think, though? Do you think that we went over the thing that we learned based on the starting non-terminal, you kind of, you know, the interest with all of that work? Yeah, so for first, we don't need the starting non-terminal, but absolutely for the follow-sets, we need the starting non-terminal, right? So we need to know where to place that end of file in there, right? So we definitely need that. So what else? What are the main things? So this tells you, okay, my non-terminals, my terminals, I've got this one, I know my starting non-terminal. What am I missing here from context-free grammars? So if I gave you a data structure containing all this information, could you calculate first and follow-sets? End of file, you can't do this, huh? You need rules? Yeah, we need rules, right? Right, without rules, we don't. Our context-free grammars are just set with non-terminals and terminals, right? It's just nothing about the rules. So what's a rule? Right, so, but what are the important bits? Right, so we have like a right-hand side, right? Which we'll put here as kind of like a list of terminals, there's definitely an R here, just kidding. Yeah, somewhere, too. This is why we were programming with my idea. Seriously, I got voted worst handwriting of my fifth grade class, so look at me now. So we have the right-hand side, and then what do we also have? Left-hand side. Left-hand side, so what's the type of the left-hand side? Just a non-terminal. Just a non-terminal, right? So remember what I was saying about asserts, right? When we create a rule, we would probably maybe want to put in an assert that says, hey, the left-hand side has to always be a non-terminal, right? If it's ever the case that I'm reading in a rule, or I've created this rule, and the left-hand side of the terminal, something's gone horribly wrong, so I should kill the program, because nothing I do after that makes sense. Yes, I think it's something that's pretty cheap and easy to do, and helps you think about what assumptions you're making in your code. I think that helps a lot. Okay, so non-terminals, terminals, and epsilon. So, okay, so we've kind of done this. How we want to represent these things are kind of, so think about it in terms of data structures. These non-terminals and these terminals, does order matter here? Yes? Why? It wouldn't have the right-hand side. Sorry, in the context of the grammar, the non-terminals and terminals, does order matter? Yes. Because if we try to find, at least it's my understanding, because if we have the way that we define non-terminals, is there things on the right-hand side that are not on the list of terminals? Move it around, but yes. But if I'm terminals as things are not on the list of non-terminals that are on the right-hand side. So we need to have that, we need to have the list of non-terminals before the list of terminals, because otherwise it becomes ambiguous, maybe not. Yes, well, okay, yeah. Since we're keeping track of what's starting on the list, the order of the rest of them should be trivial. Or it doesn't matter, right? Yeah, so you can think about, so what kind of different data structures or data types, thinking about it a little abstractly? Right, yeah, right, so a set. We only set a little bit of place, kind of like a set. Set, yeah, right? So these could be sets, because the order of them doesn't matter, right? But now when we go down here to the right-hand side, can these be sets? Why? Because they have a particular order of them. Yes, right. An a followed by an e is not same as an e followed by an a. Right, so if we have a goes to little a big a, right, then we have a goes to little b big, a little a, right? Those are too complete, like the order of the right-hand side here really matters, right? It's actually critical to our understanding and interpreting this rule of what to do. So yeah, so something like a list makes sense here. So now let's think about, how do we wanna represent terminals and non-terminals? What can we do, right? So abstractly, I think we have terminals and non-terminals, where they're all just symbols, right? They're terminals, non-terminals? Yeah. Did you just have a link list with the starting point of each every link list, the non-terminals? I don't know, what? Link list to what? I mean, you have your non-terminals who started going to link every link list. So if you have like s, a, b, b, on the left-hand side, those are all link lists in an array of link lists. So those are like all several rules, like s goes to a. Oh, sorry, this is my link. My link. Oh, okay. We'll go. So that's what I meant to do, that's right here. Oh, sure. Right, some kind of like link structure here. But even more basic, right? How do I know? So I'm, I get in an ID. How do I tell that ID matches like my non-terminals and terminals? Or? Could you check if it's in the set of non-terminals? Right, so how do I do that? Just string it. Yeah, say that again? Stringer, key number, something like that. Yeah, right? So one way to actually represent it is a string, right? Right, so we represent it as like a, or we represent it as s, id, in that other case, right? Can I use an enum? Yeah? How many elements do I need to my enum? Four. Four? How many elements are in your enum? Oh, I see what you're saying. Okay, got it. Four. So how am I gonna know how many, right? An enum, you have to specify a compile time, right? You have to specify all the things, right? So if I have an enum that's a decal, ID list one, ID list, right? I don't know what the input is. It has to work for any sized input. So I can't dynamically create an enum. What is an enum under the hood? Could we represent them as integers? Does that make comparison a little bit easier? A little bit more. So this is actually how I suggest trying to think about it, is represent, give each symbol a number, right? And you can have the, in your tokens or symbols or however you wanna do this here, right? You can have, maybe that's the index in this, the array is the bad thing, but. I don't like you guys. It's not a bad thing, but you don't know how many elements are gonna be in this array, so it should be a vector, so maybe I'll do it like this. A vector or a list or a linked list, right? So let's say I have a this and I have a b, and so this is at index zero, one, two, three. So if I represent my symbols as numbers, well I can always look them up to get the, to get the string representation, right? But as far as to our computation, we don't care if they're strings or integers as long as we can tell if they're equal or not, right? And doing a lot of string comparison operators gets very clunky very quickly and is an excellent source of errors and problems in your code. But using integers is a lot easier. You can compare integers, right? You can see if they're useful or good, or not useful or good, but because you'll have to, right? So this, and so actually looking at this, right? What's there between terminals and non-terminals? They're just like one bit of information, right? I mean it's either a terminal or a non-terminal, but there's nothing really different about them. There are all symbols, right? So you could actually put this in a list of symbols, right? And have everything in that list be, kind of do it as a tuple, but you could define a struct. That is, hey, here's the string representation and here's, and this is a non-terminal, right? This you could use an enum for. So you have, so what kind of symbols do we have in grammars generally? Non-terminals? Terminals? None? It's not actually part of the grammar, right? Epsilon. Yeah, we have epsilon. And then maybe we can add end-of-file in here too, right? So these could be different types of symbols. So this is what you guys start thinking about. You gotta start planning. Okay, how am I gonna do this? How am I gonna represent, if you represent these data structures in a way that's very clear, then performing a computation on them, you couldn't write a function called something like, I don't know, count first sets, right? That takes in a context, oh, that's a weird one. A context-free grammar, C, right? And returns, yeah, like, well, it's gonna return all first sets, right? First set, something, yeah, list of sets. So it's kind of like an abstract thing, right? So you can write a function that does this. So for any context-free grammar, you have the data structure. You know how to iterate over that data structure. You know how to get all the rules from this context-free grammar. You know how to look up for every symbol in the grammar with the left-hand side. What the symbol means is if it's a terminal, non-terminal epsilon, then you can apply the rules correctly. Yeah? So we have to create these lists, the instructions, the set of the instructions. No, no, no. So you can use... Standard library. Yes, so all the restrictions are on project two or lifted. So you can use whatever data structures you want. But you should be, you know, when you use something like a vector class or something, right, you should know how does it do equality, or a set class, right? How does it do test equality? Especially if you're using strings as a representation, right, it defaults to actually double equals if you're using a care star. So you have to be, you are in charge of making sure that it actually works correctly as a set. So I guess we'll stop here. So yeah, so I think this is probably all really go over in depth like this in project three. But we can definitely talk about questions and that kind of stuff during class. Totally open to that as we go forward. You should really start planning out how you're gonna do this. Start reading through the description multiple times. What should we be looking at to turn around time to convince her? I don't know. Okay. Maybe next week sometime we'll open more. When are we gonna get our grades for your project one? Very good. Very good. Question for you. So I actually don't know how to read that. So it's just on the website? Shoot, I don't think it's, I'm not signed in here. I know where to get it from. I just don't know how to run it. Oh, you have to. I guess you might have had an exam, but I just didn't know. I thought it would be. We'll have to find it. But yeah, the idea is you, we have to put it on your SENT OS. It'll only work on, SENT OS is where you should be testing it, either SENT OS or on, now I'm running it on the Mac. The idea is you need to first make it usually executable, which is chmod plus x, plus x means to be executable. Didn't do it correctly, yeah. And then you can just run it like that, test.slash, the name of the program, and just run it. So what it's trying to do is trying to run a.out. Okay, a.out. But there is no a.out in here. So you just put all your code and everything in here, run it, and it gives you a whole report of, hey, this is what I thought should happen. This is what didn't happen. Why is it? So it's chmod plus x, and then the name of that thing makes it executable and you just run it. I believe, wait here, I believe it's in the programming on the class home, I believe in the project guidelines. Yes, in here there is, I think the test thought it was in here somewhere. Well, that's enough to go on, it's all right. Yeah, but yeah, and this is the kind of stuff like you can totally talk as much as you want on the mailing list about how to do this and how to set it up and all that stuff. Okay, thank you, cool. All right, so I wanted to get a sort of like overview of how my logic was working and how I was thinking about doing some of the project. So this is just one way doesn't have to necessarily be this if I was gonna do it, but say I had like an array of linked lists and each one was this, each linked list described one of these right here. So then therefore, if I had all of these in a linked list and say like you add like four rules or come in for the first of like the four rules that you're using to apply on these grammar descriptions. Is that right? Is that how you describe these grammar descriptions? Rules of the context for grammar. Rules of the context for grammar. Okay, so therefore you could like look at this and then this is like S equivalent here. This is the starting one. You go over here. It's the starting one because it's here not because it's necessarily there. Yes, yes. Okay, so then I'll look at ID list, follow the rules. So I had like the rules programmed in so the rules would apply on each linked list. So that makes sense? Yes, so you basically, so like looking at it at this level, right? So when you apply first and follow sets, you're gonna go through create empty first sets for all the non-terminals in your grammar, right? Then you loop over the rules and say, okay, for all of the rules in my context free grammar, I can apply my first set rules to them to calculate and to update these first sets. So I'll have these sets for each of these. For each of the non-terminals. Sets for each of the non-terminals and I'll apply each of the four rules on each of these rules of the grammar. Yes, I would advise you against, it's tempting to think of a rule as just like a rule as like a list of I don't know symbols or something, I don't know symbols or something, right? Okay. It's really easy to do that, but then you will always like, I prefer and I think it's a good programming practice to prefer explicit over implicit, right? So here you're implicitly saying that, let's call this rule R. You're implicitly saying that rule zero is always the left-hand side. Yes, all right. And R one through N, right, is the right-hand side. Yeah, essentially. I totally think you're gonna be better off if you make it explicit and make a struct that is a right-hand side and a left-hand side that is a list, right? Just if you do that little bit of separation, then when you're reading your code, you don't have to look at it and be like, okay, I have this rule and I take rule zero and I get the first set of, look up first set of rule zero and then I'm gonna calculate like the first one of R one because like it gets really confusing to think about all those things. So how are you saying you correlate the link, the list to what, like the struct you said? Like how are you correlating these two like together? So I would think of a rule as like a left-hand, sorry, a little cursive, left-hand side, like let's call it a symbol. We use like an abstract type, right? And then a list, so a struct rule, where it has a symbol like a character followed by a list in it. Not necessarily a character, but yeah, because you have to define what these symbol is. But yeah, so I would do, honestly I would do them integers in a rule and then each of these integers would index to another data structure that tells you exactly what the representation and what the type is and everything like that. Yeah, so if you do something like this, right? Now your rule is, so in your CFG, you have essentially a list of rules, like the rule structures and these are your rules, right? So this, you can iterate over all of them and you know all the left-hand side ones. It just makes it so much more clear to break it up like this and to, it's kind of like commenting your code but commenting your data structures. Because by doing it like this, right, you're commenting that I will always have something called the left-hand side and I will have a list of things on the right-hand side. Okay, we're saying rule here, it's just in reference to the context-free grammar rules, not the rules that we're applying. Not first set of process, yes, yes, this is just data structure, context-free grammar rules, yeah, that's the point. So once you have this data structure in place, then you could apply the rules of the first set to each of the list values. Exactly. But my question is then, so if you're applying these rules of the first sets to your list of rules for the context-free grammar, you can do all that how I see without ever using this. How does this come into play? You have to read in the grammar. So yeah, you can read in the grammar from the input but it's just, you can just do like a while loop that it was like, until you see a double hash and based on the double hash and the arrow, it just puts all that in. It just seems like really complicated to look at that when it's really easy. We're learning about how to do grammars, right? How to parse and read input. Because it's really easiest to look at this but like, oh, I can just parse that. Which is fine. So this is just saying how to do, this is an example of how to do this, essentially. It's guaranteeing you that the input to your program is only going to look like this and never look like anything different. Okay, but as I say, it seems really easy to parse that. And you can do that by hand, it's very simple. Yeah, I was confused like what this was applying. This is just telling you how it's done, though. And it'll, it's the nice thing is that, yeah, I mean it's the grammars that we've been talking about, so we're saying, okay, the input to your program is from a grammar and here's the grammar if you're the tokens in that grammar. When, yeah, we could just easily give you the input and say it's going to look like this. But like, so that you can start making that link between the two and saying like, oh yeah, there is like a formal grammar for this that you can calculate on and do stuff with and write a part, write a parser before. You can also easily do it by hand. So this is saying, oh, I understand it right, that you're going to be giving like a starting symbol S or starting non-terminal S. Is that what that's? Mm-hmm. Think about it at a high level. So all of your input's going to be derived from S. Okay. That's what this is. So saying all the inputs, all the non-terminals are going to be derived from, how do you say that exactly? So it's all of your input is going to be strings derived from S. It has nothing to do with terminals, non-terminals, or anything at this point. This just says what the input is going to look like in terms of tokens. Okay. So this is saying, okay, you're going to have always a non-terminal list followed by a rule list followed by a double hash. That's the outline for our inputs, which is given there. Exactly. And then this says, okay, so what is a non-terminal list? Well, a non-terminal list is an ID list followed by a hash. ID list. What's an ID list? An ID list is an ID followed by an ID list or an ID, which means it's at least one ID and there could be any number of IDs. Okay, gotcha. So that's like, it's contacting it. It's like a dot there, essentially. It's an ID list followed by a rule list. No dots, because it's a context-free grammar. So how is it, what's this saying right here? ID followed by ID list, so it's like another ID followed by a recursive call to it. Okay. Okay. And in this case, IDs represent as a letter, a letter, or a digit, but that doesn't really get, how's that, I get that it's the definition of what's going on here, but how's that being applied? Is it just, I don't really get, what's this information telling us? How to read in the input. How to read in the input. Yeah. Okay. How do you know if an ID is correctly an ID? How, like, so this is telling you what it looks, like what it looks like, and this tells you precisely what is the input made of. Okay. I was trying to see how that applied to like a formal definition, an example like that is essentially like, how am I applying? That's how you can tell and look at this and say, okay, I know that that's an ID, that's an arrow, that's a hash, that's a double hash. You know that because they're defined here. Okay. Because we get to find those symbols differently, this grammar would still be the same, but the actual strings would look different. Yeah. I mean, it made sense for these three. I was just confused at how that ID was really being used. I get like an ID, therefore it's a terminal. But like you can see none of them starts with a number, right? Or a digit. Yeah. Right? So that's how you know they're actually an ID and they conform to this frame. Okay. All right. I think you can answer all my questions really well. Sorry for piling them on here. I was gonna ask you a question about projects too, but I'm not sure if I could think of a right way to ask it, but I'm just gonna tell you, my request was denied temporarily. I think they think I'm not, like I haven't showed up to class for the past month. So I'm gonna have to like, can I come back to your office sometimes today and just have you like write an explanation? What time's good for you? Any time after two. Okay. Yeah. I think if you can just email me and then I can write like an email that you should be able to forward to advising. Yes. That should be fine. Did this just want to make like write up like my, because of why I had to add to class so late and then like a plan to catch up, but I don't think that you're not like 100%. Yeah, exactly. You're not, you're not all behind. So that should be fine. Okay. Cool. Thank you. Question? Yeah, I get confused on this part. Okay. How does that work? About what? You have to be more specific. I don't remember. I'll look at it again. When you were explaining it, I didn't understand how the ID token worked. It's fine. It's fine. It's fine. It's cool. We can go over it. I just, I need just a little bit more to go on. Yeah. Otherwise I'll talk forever. Let me look at this. Grammar used to read the grammar. And when you mentioned that, I got confused how you're using the grammar to read the grammar. So we're defining this grammar description, right? I guess this one. This is a context-free grammar that defines the input to our program. Does a context-free grammar, just like we've been studying, just like we've been looking at? Yeah, I guess this word context-free is confusing me. I don't know why. I don't understand it. I hear it, but it's not registering anything for me. Okay. Maybe the context-free... I mean, this is what we've been studying. I understand. I understand that when I'm looking at it, maybe it's just the terminology. It's just the name. That's all it is. It's the name. So there are other types of grammars. We are specifically studying and looking at context-free grammars. Context-free means no meaning? No, no, no. It means the specifics of what it means is that these derivations are... These define definitions for the grammar? No, no, just a second. These rules, right? These rules say whenever you have an S or whenever you have an ID list, you can always replace it by either one of these two. It means that... I get that. No, no, I'm explaining. So, right, this is what these rule mean. You know exactly what they mean. The context-free means it doesn't matter where ID list appears, right? This doesn't say, hey, if you see an ID list, if you see... Like if you go... In the order? No, no, it's not the order. So the idea is if I have S goes to A or B, right? I know wherever I see an S, I can always either turn it into an A or I can turn it into a B, right? So there's how I decide which one or it doesn't depend on the context of S is the basic idea. So the idea would be, let's say... But context of S, I don't understand because S is just an S. I'm explaining to you, like a CD or something like that, right? So this would be a different way to write that. We've never looked at any kind of grammar like this, right? But this would say, hey, if you have an A before big A, then you have to choose this rule. Otherwise, if you have a B before big A, you have to choose this rule. So this would be a, I don't know if it's like a... Non-context free. Exactly, yes. Like maybe sensitive. Non-context. A context sensitive grammar, yeah. So it depends on the context of A, where it's being used in the string and what came before it. Yes. Yeah, which we don't go into, we don't touch it all in this class because we're only focusing on context free grammars. And it's just because it makes it more complicated. So that's why we don't, I mean, that's not why we don't do it. Now I get the gif of what that's about. You shouldn't worry that you don't know exactly what that term means, because we haven't really defined it in it. But now I get an idea, okay. I don't, for some reason, it wasn't clicking in that. We covered it very briefly. So that's totally fine. So the idea here is this is our input grammar to our program. These are all the strings. This describes all the strings that our program is going to accept, right? And it's written in a context free grammar. So there's obviously a little bit of difference here than what we've normally been doing, right? So normally uppercase letters were non-terminals and lowercase letters were terminals. Here, all uppercase are terminals and everything else is a non-terminal. Okay. But this describes all the strings that this language could possibly generate. So this just means, hey, so if you look at it just like a mechanical way, this describes how your program should be reading in input, right? But this, excuse me. I thought that describes what the inputs actually mean. No. This grammar description only describes how they look. How they look, exactly. The semantics describe what does it mean, right? What we're actually reading in is itself a context free grammar. So this context free grammar describes how to describe context free grammars into our program. Yeah, this is like the map of it, the general. And then the semantics give the explanations for each one. Yes, they describe, okay, what does it mean for an ID to be in a non-terminal list? And what does it mean for a rule list with an ID in the right hand side? Okay, so that's just more definitions. Yeah, like in stuff you need to know how do I convert this into my context free grammar that I can actually compute on? That's kind of cool, because it's more stuff bundled, but yet simplified. Yeah, it's, this is like we're reading, like what we're reading in is described in the context free grammar. And the thing that we're reading in are context free grammars. So you could actually input this grammar into our program. It would obviously have to write it differently. We could input that grammar into our program to do first and follow sets of this, which would be a cool test. Okay, yeah. Like you said, I have to read this over and over because for some reason I get what we did in class, but I'm not connecting to this that well. Cool, yeah, I mean, you just have to, so basically you're up until now we've been doing first and follow sets by hand. So you're gonna write a program to automate first and follow sets. Yeah, and I thought that that, I understood it by hand, I was like, oh, it shouldn't be too hard to write a program. But now it looks like I'm getting kind of. It's one of those things, it's not intrinsically hard, like fundamentally like really hard, but you have to do everything from scratch, right? So you have to make sure you're reading in the input properly. You have to make sure that you're reading in the context free grammar correctly. You have to make sure you're representing that grammar correctly in a way that you can calculate first and follow sets on it. There's a lot of room for places to mess up here, right? You can mess up at the very tiny, small detail level and you can mess up at a high level like design structure level. So how do we prevent that from happening? You have to start early and design it. Well you were giving him some advice about how you should write it that it, I don't know you were just describing it a minute ago that it. Yeah, I mean we talked about how to do, especially following up on things we talked about in class, like how to define the rules at a very low level, right? Like how do you represent a rule in a context free grammar? So we talked about well maybe you could do it with like a left hand side, like we know that every rule has a left hand side which is a grammar symbol and it has a right hand side which is a list of symbols. Okay. I guess the other thing that would be nice to talk about throughout the week would be how to write it or have the, an idea of how the program should be written out so we don't get convoluted and mess things up as you go along. Because you don't wanna get like 50 lines or 100 or 150 lines in and then it's written so poorly that you can't tell what you're doing anymore. Everything's getting mixed up. That's true, that's what you're gonna have to do. That is kind of what I'm a little worried about. Yeah, and I don't know if we'll take, I don't know if we'll take a lot of class time anymore to do this, but that's what- Maybe I'll check in with you for- Yeah, no, no, exactly, that's what I mean. So these are part of what office hours are for. So you can come in, talk to me in the TA about, hey, this the way I'm designing it, do you see any problems, that kind of stuff. And that's why starting early is so important. So that if it happens where you're like, oh no, we've- Oh yeah, if you don't have enough time. Yeah, exactly. Thank you. Okay. Any other homework questions? I'm gonna run my class. I'll be utilizing those topics.