 Morning, thanks for coming today, this early Friday morning. So some housekeeping stuff you're going to. So next Wednesday, I'm not going to be here. Oh, I know it's sad. Everybody don't cry at once. But what we will do is I will record a lecture and post online for you all to watch. You're going to be responsible for that material. So that way we can, we won't miss any time. We'll get caught up on Friday. And then the TA will use next Wednesday to A, pass back the midterms, and then B, hold a discussion section so you can feel free to ask him for midterm questions, midterm clarifications, any other kind of things. Project 3, it's kind of an open discussion forum for whatever you want to do. So I will remind you before Wednesday because I know that's probably necessary. So yeah, so it should be a good opportunity for you to take advantage of having the TA then. Any other questions? Will the lecture be played in class as well, or should we do, is the lecture, does that have to be watched outside of class, or will it also be played in class? No, it's going to be 15 minutes. It's going to be like a normal lecture. Yeah, so you'll be responsible for that content. I thought you said on Wednesday that the TA was going to be passed in the midterms. Yes. At the same class time, right? Yes. So will the lecture be played in the same class time? No. OK, this one. No, no. So the physical class time period is to hand back the midterm and to give you a discussion section with the TA so you can cover anything. Anything you have questions about midterm content, Project 3, especially Project 3, that's the time, a good time to ask him and ask all of you. But yeah, so we don't miss any content. There'll be videos to watch and content to keep up to date with. Cool. Any other questions before we get started? Anyone else ask how the midterms work? Or is it all? If you have any information. Heights go up. Part of the midterms. They're halfway graded. They were good. I was very impressed with what I saw. The DNS one was hard to grade. There's a lot of different ways to write it, I have found. But that was good. I think it's a good learning experience. So yeah, all right. OK, so let's go back to we have been talking about write a recursive descent parser, a predictive recursive descent parser, based on the first and follow sets. So we want this parser to be predictive. What does that mean for it to be predictive? Determine how a parser is based upon the first and follow. Almost. Yes, but what does it look to determine? It uses first and follow sets. But what about the input does it use to determine which rule? Just one character. Just one token or character. Yes, exactly. So yeah, the predictive part means, by looking ahead, it can decide which rule by using the first and follow sets. So we saw how to prove that, hey, this grammar does support a predictive recursive descent parser. So we've been looking at email addresses. And we've seen, OK, this is how we can define an email address. And we saw, OK, we already have the first and follow sets. We've calculated that. You can go follow this to see how to do this. And so we can say, OK, then we can create a function to parse an address, this non-terminal. So to parse an address, we know that an address is either composed of a name address or an address specification. So how are we going to tell which one is it a name address or an address specification? Yeah, so how do we decide? How do we decide? So that's how we can prove that we can decide, but how do we actually make that decision? Yeah, so it's the first thing you've got to do. Yeah, the first thing to get the token, right? We have to read something into the input, otherwise we have nothing to compare it against. And then we check the first of one of the rules and say, OK, is this token in that first set? So we hard-code this. We say, OK, if it's. So here we can see name address is either a less than symbol and an atom or a quoted string. And so we're checking, hey, if this t type is less than symbol or an atom or a quoted string, then we know it's got to be this rule, name goes to name address. Why do we do this unget token here? The input is consumed. Why is that bad? Because if you're going to one of the things and you have to look at that same token for one of the non-termals that it goes to, then you still have to look at that. Right, so yeah, so if you think about the tree, right? Part of the tree, once we see this, we know, OK, we have a tree where we start to address, right? We're trying to parse address. We have an address node. And we've decided that, OK, address has a child name address because of this rule. I know this rule applies. But name address is not a terminal, right? Name address itself is going to have its own subtree structure and going to generate terminals. So we had to look one ahead to decide which rule to choose. But if we, by looking one ahead, we moved the input tokens forward, right? But address itself, none of address's children are terminals. So we know that terminal has to come from name address and not from address or from one name address of children. We don't know. All we know is that address does not have it. And then we're going to call parse name address, right? So we have this recursive call that's going to call some other function that knows how to parse a name address. And then I'm going to say, OK, I chose the rule here. Address goes to name address, right? I know if I get back from here, what do I know if I get back? If this parse name address call returns. Yeah, it's parsed everything. It's parsed it correctly, right? There were no syntax errors, right? So syntax errors stop the parsing and say, hey, there was an error, right? So we know if we've gotten here, we know that, OK, great. We parsed a name address successfully. And now we can do whatever we want. We can build this tree with this name address, this address and this name address node. We could do some kind of parsing, whatever we want to do. It's a bad practice to pass t-type into parse name address instead of calling on getToken and then having parse name address called getToken. Bad practice. I'd say yes because you're changing the semantics of these parse functions, right? We're kind of looking at these parse underscore non-terminal functions as they take in no input, but they use the lexer to get tokens, right? So if you have to then, for some of them, pass some input and for some not, then you're creating kind of this, you're muddling the abstraction, right? They're not all the same. But I mean, it's all the way you could do it, yeah. OK, now we need to check the first of address specifications. So what are we going to check? It's the same process as before. We just use the first set and check against that. Yeah, so we use the first set the entire right-hand side, DAA, QSA, right? And then what do I do? I want to keep this token or not? Nope, right? We want to un-get this token. And then what function am I going to call? Let's go to somebody else. Yeah, address spec. Yeah, parse address specification, right? So we know that's the rule. We know we have to parse it. If that returns, we know everything went successfully, so we've properly parsed it. So what happens if it's neither of these two cases? I mean, there's an error. And this is what the first set tells us, right? The first set of address says, hey, when we are trying to parse an address, if we read a token that's not in the first set of address, then we have a huge problem, right? Or we don't have a huge problem, we have syntax error. This is not a valid string that came from this grammar because address can only start with dotted atom at, a quoted string at, a less than symbol in the atom or a quoted string. Do we have to account for that for any of the test cases? Like syntax errors? Or are we going to use it? In this project, no. In this project, you just need to, in project three, you can assume that it's well typed, but in other projects, that's not going to be necessarily the case. So for receiving syntax errors, it's because the program that we have written is reading incorrectly. Say that again? If we are getting syntax errors, it means that we have written something incorrectly in our project rather than the test case is incorrect. Yes, unless you wrote your own test case and your test case is wrong. The test case is given. Yes, yes. Okay, so we can do this exact same process with a name address. So we have the first and follows here. So this follows the same process. So what are we going to check first? Display name, yeah, it's the first of display name angle address, which is atom quoted string. So we're first going to get a token, we check that. We say, okay, it's, so why am I doing display name angle address here, not just display name? Yeah, right, so I have to check with the first of this entire rule, right? Because if, exactly, if display name has an epsilon in it, then we say, okay, but we have to also take the first of angle address, right? And we know how to take the first of the sequence of tokens, right, because that's what we do all the time. But because there is no epsilon display name, right? It's just atom quoted string, right? So we check if t-type is atom or t-type is quoted string, are we going to do unget token or not unget? Yeah, we want to unget that token. We want to call parse display name, and then are we done? Is this the rule properly done? Parsity, hesitant no, why? We also have to parse the angle address, we have to handle that. Right, so we've only created one child, right? We called this parse, so we have our name address, right? And we have, we just parsed the display name, which is great, but now we need to parse, we know that after display name comes an angle address, right? And this is because of this rule here, name address goes to display name, angle address. And now are we done? Now do we know that this, is this rule good? If this thing returns? Yeah, right, if this returns, that means that it was able to parse its own tree, everything's syntactically valid there. So we can say, okay, I knew I chose this rule. So then what's the first set? We're gonna have to check the next. So if it's not an atom or a quoted string. Check if it's a left angle bracket. Yeah, check if it's a less than similar, left angle bracket, right? So if it's in the first of angle address, we're gonna call unget token again, we're gonna call parse angle address, and we're gonna say, hey, this was name address, goes to angle address, right? So even though I have parsing angle address in both of these two if clauses, what's the difference here? Now, so the top one has a display name before the angle address, and I can tell which one just by looking at that one token, right? I know if it's a less than symbol, then it's gotta be, it can only possibly have come from this rule. Whereas if it's a quoted string or an atom, then I know it had to have come from this rule. It has to be a display name first followed by an angle address. What do I do if it's neither of these? Syntax, our old friend, syntax error. Okay, let's, we're gonna go through this quickly because this is very same. We do the same process. This has only one, so we check the first sets, and then we make sure it's in that first set otherwise it can syntax error. Ah, okay, so here we have an epsilon. So here's what becomes interesting, right? So display name list is either a word followed by a display name list or epsilon. All right, so we wanna do this parsing. What's the very first thing we do? You got it? Automatic? Yes, get token, right? We always get the token. All right, then what are we checking the first of? Word display name list, right? So we wanna make sure we always check the entire rule, and then we see, okay, there's no epsilon in words, so it's atom quoted string. All right, so we check. If it's an atom or it's a quoted string, then we call unget token, right? And we're gonna parse, what do we gotta parse first? Word first. A word, and then display name list, which is the same function, right? It's gonna happen again. And then we're gonna print out, hey, we parse display name list goes to word display name list. So what do we do? If it's not this, do we just say it's a syntax error? So then what do we check? What rule are we trying to check, right? So these comments here, right above this check, is trying to convey what we're attempting to check. The next condition we check if the next token is in the, or this token that we just got is in the follow set of word display name list. Because of the epsilon. That's how we have it. Why word display name list? Well, just display name list because there's no epsilon in words. Yeah, because that's us, right? So yeah, so this is, we are parsing display name list, right? If it goes to epsilon, then that token must have come, it must be one of the tokens in our follow set, right? So it must be either an atom, a quoted string, or a left angle bracket, right? So we can't check for epsilon. When we call get token, we never are going to get an epsilon, yeah. Don't you mean just an angle bracket? Because we're looking at just the follow set for display name. Oh, yes, sorry. I was looking at follow up word. Yeah. I didn't see that. Okay, yes, yes. Display name list, left angle bracket. Yeah, so we check, is the T type, is the token that we just got, is it a left angle bracket? If it is, then what do we do? If we just return what would happen to somebody who- Unget token return. Yeah, right? So we want to unget token, right? Because we have to make sure, because we just read that token and we just proved it actually came from after us, not from display name list, right? So we need to make sure we push that token back so that we're not consuming that input. Okay, then we print out, we say, okay, we got display name list, went to epsilon here. Right? So you can see this is a recursive call that's going to keep going until, as long as it's reading in essentially words, which are atoms or quoted strings, it's going to keep calling this until it reads a less than symbol. And then it's going to stop. This parse display name list, the lowest one is going to return. And then we're going to actually have parsed that whole list and to create that tree structure for us. And one of it's not one of these? Yeah, syntax error, right? We know it has to be either in one of these first sets or if there's an epsilon, it's going to be in our follow set. Now let's look at, I believe, yeah, there we go, that's a good one. So angle address is a left bracket, an address specification and a right bracket. Here's the first and the follows. If we want to parse this first thing we do. Get token. Get token, they said, all together. All right. So what do we want to check the first of? Yeah? You just make sure that it's a... What do we want to check the first set of if we want to not think about it? Yes, true. But our process is we want to check each of the rules and we want to check the first set of the entire rule, right? So we want to check the first set of left bracket address specification, right bracket, to know that it's this rule. If there's only one case, yes, that first set is going to be the same thing as the first set of angle address. But just to show that this isn't a special case of the other properties. Okay, so we check this. What's the first set of this sequence? Left angle bracket? We say, okay, if the t type is left angle bracket, do we call and get token? No. Do we not call it? Do we want to call it? It's a terminal. Because it's a terminal? What does that really say? Call it. You should call it. What am I trying to parse here? What am I trying to make a tree out of? The address. The... So what's the root of this tree? Angle address, right? And so what's the left-most child going to be? The left angle bracket, which is a terminal or non-terminal? Is it going to have any children? It's a terminal. No, it's a terminal. And then what's the second, the next child going to be? The address bracket. The address specification and the next child? The right angle bracket. The right angle bracket. So then do I want to call and get token? No. No, because address specification doesn't care about that left angle bracket. That left angle bracket came from the angle address. It's one of our children. So when we call parse address specification, we need to make sure that we've consumed and moved that input and said, yes, okay, we've matched up correctly. We know that we generated a left angle bracket, which means when you start parsing, you should parse after that left angle bracket. So now that I call this, am I done? Do I know that now everything's all good? No, no. What do I need to do? And once that's done, then you need to check it. There is a right angle bracket afterwards. How do I check if there's a right angle bracket after it? Get token. Yeah, so I want to, why don't I want to get another token? We haven't done this before. Because after you parsed out address specification, your lexer is looking right after that, but it hasn't gone onto the next character. Yeah, so we want to know, right? We're trying to build these children, right? And say, okay, could this input string have possibly come from an angle address? So we know if, the first case is it has to be a left bracket, right? And then we have to parse an address specification, which does something else that we don't care about. And then we have to make sure that we have a right angle bracket, otherwise it doesn't conform to this rule, right? So what if the type there is not a right angle bracket? So syntax error. Syntax error, right? It's a syntax error, it's a problem. And now do I know that I parsed this correctly? Do I parse this whole rule? Yep. Yeah, so I can say, okay, this was my rule, left bracket, angle specification, right bracket. What if it's not in any of there? Syntax error, yes. Okay, so we can see a similar thing here. We'll go over fairly quickly, because there's other things we want to get to. So we follow the same process. We're going to parse the token. We're going to check if it's in the first of domain, adamat, domain. And do we call unget token here? Hearing a mumble of yeses and nos. Anyone want to venture the thought? Yes. I'm going with no. You're going with no. Who's going with yes? DAA is a terminal. Oh, it's a terminal. Right, yeah, this is a terminal, right? So this is a token, so we want, all right, so they're all tokens, but this is a terminal, so we need to say that yes, this generated a terminal. We don't want to push that token back, right? We want to move the lexing forward and the parsing forward. And then we're going to call parse domain and then we're good. And then we check the first of quoted string at domain. Are we going to call unget token? No, right? For the same reasons. And then we're going to help with that. Otherwise we'll just say syntax error. Let's see, okay, this is a pretty simple one. We just have domain is dotted at, or dotted adam is what the DA stands for. Otherwise it gets too complicated, the long names are here. Okay, get the t token, get the t type. What are we checking? Yeah, if it's in the first set of DA, so what's that checking for? Yeah, t type equals DA, right? And then do we unget token? Do we call any other recursive parsing functions? Exactly, we're at a leaf now, right? We're at a leaf and we know we've read this domain adam, or dotted adam, right? So we say, okay, this is that rule, great. So we know there's nothing more we can do here. So we know whenever we get to domain, right? We're gonna stop and our parsing's gonna continue. But we stopped, but we've consumed a token from the input, from the input string, from our tokenizer, or from our lexer, right? So it's not a dotted adam. A word is an adam or a quoted string, so this should follow pretty straightforwardly. We get a token, we first check if it's an adam, then I say great, word goes to adam. Otherwise it's a quoted string, I say great, it's word goes quoted string. Otherwise it's a syntax error. Questions on parsing, how to write this parser? So because there's no questions, you'd be totally good if I put this on a midterm of here's a grammar, here's first and follow sets, write a recursive parser for this grammar. So let's encourage people to ask questions. There will be a questions like that on the midterms and finals, this is important. There'll also be homework questions. Cool, all right, good. So we can actually turn this into rules about how to do this. So we can say that, I didn't roll this out. We can say, okay, to write this, there's actually a pretty straightforward algorithm that we've been following while doing this. We've been trying to reason about it and trying to think about it, why this is correct. But we basically are gonna create for every non-terminal in our grammar, we're gonna create a function called parse, that name of that non-terminal. Why don't we create functions with parse terminals? I mean, non-terminals are going to generate terminals. So we don't have to worry about that there. So for each, when we're writing our parse A function, if we have a rule A goes to alpha, where I remember we've been using alpha as a sequence of tokens of terminals and non-terminals, we check if get token is in the first of alpha, which is that rule, then choose the production rule A goes to alpha, right? So that's what all of those if checks we were doing, right? We were checking the first set of that entire rule. So that's why I tried to make a point about not just checking the first set of the left bracket, but you have to check the first set of that entire rule. And we're gonna say for every terminal and non-terminal A and alpha, so we're gonna go through, iterate through all of those symbols. If it's a non-terminal, we call parse A, if A is a terminal, we check that get token is equal to that thing that we're expecting. And then we have this rule, which we have devised, right? Of how to check if epsilon is in the first set of alpha, right? Then we know that to check this, we have to check the follow set of A. And basically this last clause just says, hey, if none of these cases has happened, right? If none of the rules match, then we have a syntax error. Questions, confusions, general questions. I'm wondering why we haven't checked the type of the token and not the actual value of the token the whole time, I guess. What's the difference between the type of the token and the value of the token? So let's take it back to what are tokens, right? Like when we defined our ID token, right? Did that just match the string ID? Yeah, yes, it could. Does it just, if and only if, strictly match the string ID? No, right? The whole point of tokenization is so we could abstract all these strings and say, hey, I know there's an infinite number of identifiers you could write, but I will treat all of them, if they look this way, if they match this regular expression, I will treat them as the abstract token ID, right? And that way, for syntax purposes, I can make sure that when you're doing addition, you have an ID, a plus operator and an ID, right? Do you always have that? That you never have an ID plus semicolon, right? Because that's an invalid syntax. So that's why here we're dealing with tokens, right? We'll get to, the next thing we're gonna cover is, okay, we can build this tree, right, using this parser, but what does that tree mean? Which then you kind of have to go back to what are those IDs, what are the actual values of those IDs? But at this stage of the game, we only want to talk about abstract tokens. Now let's talk about some tricky cases that can arise when we want to write a parser. Not my day today. Third time's a charm. The buttons work. Success. Okay, let's consider, let's think about something like this, right? So can we do first set, so pretty much everything here? What's the, let's start, what's the first set of A? A epsilon, B epsilon, right? First set of D? E epsilon. E epsilon, first set of E? E epsilon. E epsilon. So the first set of S? A, B, C, D, E, F, G, D, E, F, G, F. Was it? A, B, C, D, E, F, F, epsilon. A, B, C. So why is it C? Because both A and B have epsilon's in their first set, which means we have the next one, which is C to the first set of S. And then we add D, E, and F, right? Right, so no epsilon, because there is not epsilon in any of the rules here in all of their first sets, right? Okay, we just want to, let's focus on writing the parse S. So at first, what's the first thing I do? Get the other token. Right, we call it lexer, we get a token. So do I check if it's an A, and then I'll know that it's this rule? It's worse. Yeah, so I want to check, what am I checking? What do my rules say? If it's in the first or the latter. Right, this entire thing, right? This is alpha. We have a rule S goes to alpha. So we want to say, is it in the first set of A, B, C? Right, so what's the first set of A, B, C? Is there an epsilon in there? Yes. Because of the C here, right? So yeah, so now I say, okay, is T type is equal to A, T type is equal to B, T type is equal to C. Then what am I going to do? So do I call? First A, first B. You don't know if you call on get. Somebody will get mold. Yeah, you call on get first, then first A, first B. So the question, the first question is, do I call on get token or not? Yes. Yes. Yes, why? Because your first, first thing in there is it isn't on terminal. Right, the first thing in here is in non-termal. So at this point, I don't know if this A came from this A, this B came from this B, or this C came from this little C again, right? I don't really care, but I know that the first thing is a big A, right? And I know that that's not the token that I, that's a non-terminal. So I didn't generate that A or that B or that C at this point, right? So I call parse A. I don't understand that. Which part of that? How do you know it's a big A? It has to be because of this rule. I'm using this first set to say, okay, in this first set, it's either T type, must be either A, B, or C. Looking one character ahead, that's why I can choose it's S goes to A, B, C over S goes to D, E, F, right? If it was in the first set of D, E, F, then I'd have to choose this other rule. So the first set helps me, the matching on the first set of the rule says, okay, I know it has to be this rule. So if it came, but I don't know that it actually came from this rule again, right? I've only looked one character ahead. This rule could define a string that's 100,000 characters long, right? Or a thousand tokens long. But I know that the first thing better be, like if I'm drawing my tree, right? I'm S, I know which of my children it should be. So now I need to say, okay, is the leftmost child, could it be an A? Right, and that's why I call parse A. So parse A is gonna go, it's gonna recursively build its own tree. It's gonna return. If it returns, then we know we parsed an A successfully. So then what do we parse? Pars B, which would be trying to build this. And then if this happens correctly, it's gonna build its own tree. It's gonna return. And then what I do, what I want the next child to be. Little C. Little C, so do I call parse little C? Right, so then what am I checking? Then what? So now if I get here, I know I've properly created. So now after this, I know I've created this part of the tree. And so I've successfully parsed an A, I've successfully parsed a B, and I've successfully parsed a C. So now I know that this string did come from this grammar, right? S produced this, whatever my input string is. And so the kind of part of the point here is we don't have to special case this issue of, okay, we have a C here but there's epsilon's and A or B so I could check if it's a C. That means it must have gone to here. We don't have to worry about that because A and B will deal with the case that they went to epsilon, right? All we know is that after A and B return, we better have a C still to parse otherwise something went wrong. Questions on this example? So what if I had changed it like this? How would that change right here? It would just go A, C. Or there wouldn't be a parse B and then inside of it it would go off B. Yeah, so the first thing actually that would change but that's fine because we didn't pre-calculate it, right? Is that first set here would change? Yeah. Right, the first set of this is only A, C, right? There's no B because there's no epsilon and the first set is a little C. But how does this change? Get rid of parse B. Get rid of parse B. Move it actually. I'm not gonna do that but I will rewrite it somewhere else. Right. Where should I rewrite it? So you keep that all that stuff, and then after that then you do another get token. And then if- After this if clause here? Here? Yeah. Or not, get rid of that, that crack right above it. Okay, cool. All right. What did she need to get token first? Because you got your token, that was a C. So you will know that then you need to get the next token to see if it's a B. Do I have to do that though? Do I have any choice here about what this token is gonna be? I guess that could be handle with parse B. Yeah right, parse B is gonna make sure that its token is correct and that it's properly parsing a B. Yeah, so in this case the tree we're generating right is S, we've had S goes to A, A generates something. Then we check, hey we better make sure that next thing is a C, right? Which is what this check does. Then we say, okay whatever's after that is a B, right? We have absolutely no choice about what this next symbol is. And then if it successfully returns, then we know we have, right? We've successfully parsed that string. What may I also wanna check with S about it? But at the end here, right? So I do this, let's say I have everything properly, I've checked it, there's no syntax error in any of those other strings. What then do I need to check? Does that just mean that it's a good string? Yeah. You wanna check for end file? Wanna check for end file, why? Because if there's extra stuff, then you also have a problem. When you're expecting to end it, opt to. Right, so if we look at this grammar, right? If I have the string, we'll keep it as it is here. If I have the string A, C, B, A, B, C, right? So we have, we can have S goes to A, which goes to little A, right, which matches here. We have S goes to C, which matches here. We have S goes to big B, which goes to little B here. But there's still extra stuff, but I've parsed all of S. There's no possible way I could get these more, right? So this string is not in this grammar. So we wanna make sure we check. Okay, we'll do one more check at the end here. So do we have to do this in everything? Just starting. Right, just a starting non-terminal. Which is the starting non-terminal? Traditionally, S, I mean, that's the way we've been. Right, so S has been the starting non-terminal. If it's not specified, S is the starting non-terminal. Could be a different token. It could be a different token. Right, exactly. If it's not end-of-file, like we know it, if it's to be correct, it has to be an end-of-file after S. So we've parsed all of S's. This makes sure that we actually have a tree that looks like this structure and that there's nothing after the input string here. Questions on this? So, again, when do you do this check for, well, checking if it's an end-of-file, do you just run this after the last token and what your parsing on it? You would do this after all of the rules for checking all the rules for S. So you'd say, okay, check the first set of this rule. S goes the first set of this rule. If it is, then you parse all those. Else, otherwise, if it's in the first set of this rule, then you parse that rule. Otherwise, the syntax error. And then after all of that, right, if you've successfully made it there, you correctly parsed the whole tree, then you wanna make sure there's nothing more after. Semantics. Now we get on to semantics. That's analysis in the dust. So what is semantics? So what have we done so far? What was lexical? What is lexors? What is lexical analysis? What are they doing for us? What is it? Yeah, they're turning, right? So they're turning bytes into tokens, right? Because like we just talked about today, right? We don't care that IDs can be, that you can specify, you know, an infinite number of things for an ID. All we care about is something at an abstract level of an ID. So then what did syntax analysis help us to find from these, so syntax analysis, right? Turns these bytes, turns them into a sequence of tokens. And what is syntax analysis helpful for? Yeah, does it? So not necessarily, not quite a valid program yet, but is this a valid sequence of tokens, right? Did this sequence of tokens, is it possible that it came from this grammar, right? Does the grammar that we have generate this sequence of tokens? What is kind of the data structure we want to think about from syntax analysis that we're creating? The tree. The tree, the parse tree, right? That's what we want to think about these things. Just thinking about them in the parse tree. So think about a language like, I didn't bring the book. Think about a language like C, right? There's a grammar that specifies, a context-free grammar that specifies what a C program looks like. So is every possible string, every possible sequence of tokens that conforms to that grammar, is it a valid C program? Can you run every program that the grammar generates? No, I know. It's perfectly possible to produce a program that's syntactically correct that won't work. If you've ever gotten a segmentation fault, this is what you want to do. Has nobody done that before? Right? You compile a program. It maybe compiles fine, so compile, I mean, it's syntax checks, right? So it doesn't say you're missing a semi-colon, it doesn't say you have mismatched parentheses, right? It looks syntactically valid, but it's meaningless to actually try to execute, right? Because you're accessing memory or the compiler's also really good about saving you from a lot of other errors, right? And it was just syntax, right? Like what happens if you try to use a variable that's never been declared in C or C++? That's an error, right? It actually won't even compile, but it's syntactically correct, right? There's nothing wrong with your syntax. And so semantics, that's where it gets into the semantics. Semantics means is about what does that parse tree mean? And how do we know what parts of that, like what parse trees are valid or make sense in our language, right? So up until now, we've been doing some kind of mechanical in some sense or turning bytes into tokens, sequence of tokens, then turning that sequence of tokens into a tree. Now we get into some kind of deeper, how do we design languages and how do I, how do we talk about the design of a language? Like how do I, can I just give you a context free grammar and you can program in my language? That'd be kind of awesome, right? So how did you learn? How to program in Java or C or C++? Examples. What was that? Trial and error. Trial and error? Guided trial and error though, right? You weren't just randomly typing things in to see. Right, you had this search space, but you kind of knew what was valid, so you tried different things to see if they would work. It's the old book he's writing in Shakespeare. Yeah. I'm not gonna call you all trying to find these, but. So how do we do that? So what properties, so if we wanted to find our language semantics, right? We wanna, what are we trying to get out of language semantics? Why is this important? Who is it important for? Do we care? Ourselves, like our personal selves, like self growth then. Maybe. Why do you care about the semantics of Java or C or C++? If you want to make sense, why? So it works. So it works. What works? Yes, because you want a program in that language, right? You want to write a program in that language and why, so when you say it works, what does that mean? Well, it doesn't mean that it works. When you type in, it gives you the expected output. Yes, right? So that the machine is going to do what you expect it to do, right? That's basically the basis of programming, right? We're trying to convince this stupid, incredibly fast machine to do something for us, right? But if, and all we have is our silly little English style, like that's the way we wanna do it in, but it wants ones and zeros, right? And so, how can we make sure that what we envision, how we want our program to work, how can we make sure that the computer understands the same thing, right? Because at the end of the day, if it doesn't do it, you're wrong, right? Doesn't matter how much, you can't just will the computer to do what you want it to do, as much as we all wish we could. Maybe someday, I know, I think you could like hook your brain up to the computer. I'd be worried about going the other way, though. Right. Right. Programming. Exactly. So what do you want when you try, when you're trying to study, when you're trying to understand the programming language, what do you want from the semantics? It needs to be very clear about the program and to understand what the language will support, but also for the computer to make sense of what the program is trying to tell it. Yeah, so there's two different. It's not only you developing the program, it's also the compiler writer, right? They want to make sure that when you, the programmer, write something, if it's in the semantics, it better do that thing, right, for you. So it needs to be precise, right? It needs to be precise. Okay, this is what happens. What other features are useful? What other things? Human readable? Yes, right? You want to be able to understand it, right, as a person, you know, it's not, machines aren't reading, most of the time, semantic documentation, right? It's humans. What about predictability? Is that important? Yeah. Yeah? You want to make sure that the program you wrote today is going to work a week from now or on Tuesdays and Wednesdays, not on Fridays, right? How difficult would it be to write a program like that? There's a language called intercal. There's a joke language, but you have to say please enough times. It'll complain that you didn't say please enough. It's actually really cool. Oh, spelling is important. So what if the semantics, I don't know, what if they're incomplete? Bad spelling. What if they're incomplete? Is that useful? If it doesn't specify all possible, right? If it doesn't say how something should happen and you write a program that takes advantage of that, what happens? What does a machine do? Exactly. Yeah, we can't predict it. It's about predictability. It's about preciseness. It's about conveying our thoughts and our algorithms to the machine. Okay, so when we get back, I want you to think about on Monday how we can specify language semantics. So think about the ways you've learned programming languages and you learn, what does it mean when I call a function, when I add two numbers together, when I concatenate strings? How did I learn that in one of the different ways that we made so? Say a number? And then the un-get broken. See? What's the kind of logic? What's the kind of logic?