 Any questions before we get started today? Just wondering on the homework, like on the phone blank part, should we still try to show our turn of thoughts? Yeah, you could do that. That would be good. So you can show how it matches to the regular question. You said that I was reading this from the group post. Is there something about the post of the wrong homework? No, no, just the link. The link was wrong to the wrong blackboard. So as long as you should all be able to see the homework on blackboard. So yeah, that's all good. For the last problem, is there a sort of link that the hex has to be? It could be infinite, right? That's the nice thing about regular expressions. We don't have to worry about the length. Alright, let's talk about rockets for a little bit. They're cool, right? So is anybody in here a rocket scientist? You're all just lame computer scientists? Cool. I have two, so I'm there with you. So, anybody know the story behind Mariner 1? Didn't it crash like a missing, like a bracket mismatch or something? Yeah, almost. So it crashed. That's kind of the important thing, I guess. But the reason why we want to talk about it is it crashed for very interesting reasons. That actually have to do with lexical analysis. Which is the stuff we've just been learning about. So in some programming languages, white space is not significant at all. So in different programming languages, when you have spaces in between tokens, those could maybe specify white space. So that's how you differentiate between two tokens, like between a number and an identifier, like normal programming languages. What about, like, is our new lines significant in a language like C or Java? No. No. You can write your program all on one line. You don't have to use any returns. Those are just better for you, the programmer, and for future you, to try and maintain that program. But what about a language like Python? Oh yeah, I heard. Oh yeah, yes. Python not only are new lines important, but tabs and spaces, the indentation is also important. So in most programming languages, right, white space is sometimes significant, but also sometimes not, right? So you can write a program that's left parentheses, space five plus, I'm not going to say it, which I'm particularly saying out loud on the left, versus on the right, with no, absolutely no spaces. And you can write that, right, in every programming language. And is this going to parse and lex the same into the same tokens? Yeah, right. Left parentheses token, a number token, the plus operator token, number token, and right parentheses. Well, it turns out in FORTRAN, anybody awesome FORTRAN programmers? I've also never programmed in FORTRAN. Apparently some of these old languages, if you specialize in them, you can actually make a very good living as a consultant, because people will pay you lots of money to fix problems in their FORTRAN code, because nobody knows how to read them and fix them. So what happened was, in the Mariner 1, there was a one line, there was a line like this. Now, in FORTRAN, you can actually, you can do a loop like this. You can do a do loop. So this is do I from 15, I believe, up until 1100. So you're doing a loop over a variable, right? So a for loop. Well, it turns out that this is what they wanted to write, because I believe it all means 1 to 100. I'm honestly not a FORTRAN expert. So I don't know 100% exactly how to parse that. But what they ended up writing was one character difference. They ended up replacing that comma with a period. So usually if you were to make a one character error in your program, I know what's the best case scenario and what's the worst case scenario? Something in the back? No, it's head scratching. Best case is it still works as intended. Is that the best case, though? For you, the programmer. Oh, maybe, okay. Still works as intended, yes. I mean, as in it causes no compilation errors, it makes no significant difference. Let's say, yes, if your one character difference would you added an extra space here, right? I guess that would be the best case scenario. What about second best? Your compiler yells at you until you fix it? Yeah, why is that the second best case? So you know where the problem is. You know where the problem is, but not only do you know where it is, right? You know any consequences on that one? Yeah, no one said you didn't release the code with a bug in it, right? You caught it before it never got deployed because your compiler told you there was a problem. So yeah, the worst, I mean, the best way would be if you, probably if you got rid of this plus here, I think the compiler would complain. Or you didn't close the parentheses correctly, right? The compiler would complain. Or if you changed due to something like 2TO, right? I think the compiler would complain in that case. What's the, so let's go down the levels. Yeah, let's blow that. I was going to say the worst days. Your rocket ship blows up. Your rocket ship blows up. Yeah, in this case there's no people on the rocket ship, that's not even, I guess, the absolute worst case, but still a lot of money goes into rockets, right? So still not the best case. But let's talk about it. Regardless of where it goes out into production, right? What are some of the other levels below? Compiler throws an error. Yeah. It creates an error that you don't realize until you've distributed the products, and it's bad enough you have to make a recall. Yeah, so that would be, yeah, so I guess while below that would be, basically, generally, if it escapes, it still compiles, right? But it has a subtle error that only occurs either in production and doesn't occur during testing. So this line is a for loop. What do you think this line is? The assignment. An assignment, yeah, is that crazy? So because in FORTRAN, spaces aren't significant, what the compiler sees is it sees the second one as a variable called do15i being assigned to 1.100 decimal. So they did variable assignment instead of a loop. So think about any of the programs you've written. When you wanted a loop, if instead you got variable assignment, that would probably be pretty bad, right? And unfortunately, this happened in such a way that it didn't, the bug wasn't triggered or whatever didn't happen until it was actually in flight and caused the rocket to blow. Anyways, this goes to show, parsing errors are, I mean, you know, flexible analysis is very important in understanding how the language actually drives tokens from your input. It also kind of says that, yeah, white space in some cases you do want to be significant. Did they say what that was doing? What it was doing. Yeah, like what it was assigning, I don't remember. You can look it up. There's a couple good postings on it. It also may be not 100% true. I don't know. This is definitely a problem in Fortran that people have identified. There's some debate, some people say that, yes, this was definitely the problem. Some people say it was and it was something else. I don't know. It's a cool example of that, right? Help you think about your own code. Also, be very careful if you're writing software that goes on rockets. I wanted to go over another example of doing lexing and long-term prefix matching. We're going to do, let's see if this works. Let's go over an example. We're going to say our symbols. What symbols do we want? Let's go A. I feel like that's pretty easy, right? I need some regular expressions. Let's call them alpha and beta. Let's say any number of A's followed by any number of B's or C's ending with an A, because it starts with an A, so A's are important. And we'll say B, what starts with an A is followed by any number of B's that ends with a big A. Who ends with a big A? Because that's more like one of our own works. It's not one of our own. Ah, correct. What does this mean? It's using the previous definition. Yeah, it's using the previous definition of the regular expression. We say it ends with a B, too, because that's important. This just means, just substitute this whole thing in here for A. It's not changing it or adding it or we're not saying there's a symbol A. That's why in the homeworks and on the exams, we'll use Greek symbols for the regular expressions. It's a very clear, what are regular expressions and what are the symbols in our alphabet? But here we've defined the symbols, so we've got a little bit of a leeway. Okay, so can I do something like this? Can I say, okay, it's, let's say C star followed by an A followed by a big C. Can I do that? No, no. Why? It hasn't been defined yet. Yeah, exactly, right? So we're, this isn't a, we can't do any recursive regular expression definitions. How would this, how would you terminate this regular expression, right? It would go on forever. So yeah, we definitely can't do that. So let's say, you know, let's just, we'll keep it simple. Any number of C's followed by a C. Let's say two C's. These are all lowercase, yes. Can't you tell the difference between my amazing handwriting? Sure. Okay, good. Okay, so the string, let's go with the string. We're trying to parse is, all right, um, trying to, well, it's going to be difficult to come up with something that exactly works and shows how hard we want to do. Let's go with this A, A, B. Maybe have any ideas? Oh, that'd be cool if I wrote that. All right, so one of my four columns I'm going to create right here. Yeah, the string, what's the next one? Matching. Next one? You guys are going to tell me if I just spell it right. If we don't, will the rocket explode? Will the rocket explode? Let's hope not. Let's call this match. As long as it matches. Okay, great. So I've got my string, just clearly not enough room that I'm going to be able to write this whole string there. Look at this. A, A, B, A, B, C, C, A, B, C, A, A. Okay, very important thing. First thing, right, this is, I don't know, test hints in general. When you're copying your own things from one place to another, from let's say here to here, make sure you've copied it correctly. Take a second, double check. I've seen cases where people just got the answer wrong. They straight up miscopied from one part of their own assignment to another. It's very upsetting. A, A, B, A, B, C, C, A, B, C, A, A. Cool. Okay, so we start off with the potentials all. We have nothing in matching. Right? So do any of these writer questions match right now? No. Why? It does. It does. We haven't looked at anything yet. Right? We've got the start of the string. But let's say one of them could match epsilon, right? Then it would match right at the start. But none of these, so this has to be at least length two. This has to be at least like one. This has to be at least like three. So let's first look at just the A. So, does any of these just match A? A. Big A. Big A, why big A? Sorry. It could be also big A. It has to match all of them. So to be in the matching column, it has to match this entire regular expression. It has to be a string that's in L of A. It has to be at the point where it has to be all at the end of the regular expression. We have to have matched it completely. There's a pencil to you. So it has the pencil to be A. Yeah, definitely. And also B. Also B because B starts with A. It matches A. It does match A. It does match A? How does it match A? A star goes up. Nothing but the second and just the A. Yeah, right? So the language described by A, if this is zero or more, this is zero or more, we have a bunch of other crap in there. But we also just have A by itself. So yeah, A will also be in matching for potential. So we have A is potentially matching because the first... Oh, because it could be any number. Exactly, including zero. Yep. Okay. So B is potentially matching, right? It's pretty straightforward. We have the A right here. Is C... Can C be potentially matching this? No. No? Right? So there's no strings in the language described by C that start with an A. Right? They all have to... I mean, they're all combinations of C, but none of them start with A. So we know A is not a matching. Sorry. C is not a potential. Okay. So then for match, we have A with length one. Okay. So we still have potential, so we're just going to move on. So now we're going to look at A, B. A, A. Sorry. A, A. Yes. Let's just go like this. Let's hold it easier. Okay. A, A. Does A match? Mm-hmm. Still match A, A? Yes. Because you get a 1A and an A again. 1A here. This goes zero or more. And one at the end. Yes. We have A matching here. Potential. Does A still have the potential to match? Yes. Yeah. Yes. Right? Potential to match would be a prefix, right? Is there a prefix in here that matches A, A? Mm-hmm. Yeah. Zero or more here. Right? So we can have those match those two A's, and we still have the rest of the stream to go, the rest of the regular expression to go. So we can continue going through that. So A still matches. What about B? Not anymore. Actually, it does. Because it starts with the A, and then the A gets all the way to the point. We got it. It can't. So B still does have the potential to match. Oh, cool. Okay. That's capital, right? This is a capital A, right? This just means this regular expression. So you substitute this entire regular expression there. Yeah, it could be. It could be potentially matching? Yeah. Some just say yes. Some just say no. Yeah. It potentially matches. Yeah. So this A, the first A in the string would match this A. The B would go to zero, and then the A star from in here would match the second A. Right? So we still have B matching, and we have the longest match so far is A of match two, of like two. Okay. So now we have AAB. So the string AAB. Does it match A, though? Not anymore. Not anymore. Why? Exactly, right? There's a B at the end. So this means that, right, this A right here means that every string that's in the language of A must end with a single A. The string AAB is not in the language described by A. Right? Because it has a B at the end. There's no possible way that it's in the language described by A. So A is no longer matching. What about, so is it a potential to match? Yeah. Yes. Right? AAB, the prefix AAB is still in the language described by A. Right? Two A's here, one B here. So it's still here in the potential. What about, so now let's talk about B. So does B match? Yes. Yes, how? Because the first A is the first A and then the second A is called the big A and it ends on a B. So this would be, this A matches this A. The B goes to zero. The first A in here goes to zero. B and C both go to zero. B and C both go to zero. This A matches this A here. And then we come back here. This B matches this B right here. So now we've got a match for B. Does B still have the potential to match? No, this would be the last one word, could. Is it? You always have the B end in the middle. Yeah, because we have... Yeah, it could be the B still in the same A. Yeah, let's go in the front. I would say that you can keep going because of the top one in A, you have A, B, and C that curl infinite. Right, so you have A number of A's and B's here. So this first A has to match the A that can start at B. This B star has to go to zero. But in here, the A can match that first A. But then the B's can go into any number of B's followed by an A followed by a B. So yeah, we still have the potential to match in B, a longer string. But we have... So the longest match so far is B3. So are we done? No, we're not. So now we're going to look at what... What's the string we're looking at? A, A, B, A. A, A, B, A. Okay, second one by one. So A, A had the potential to match. Does it still match? Yeah, it matches. Now it matches? Yep. That matches. Does it have the potential to match? No. Right, so any number, exactly. So no matter what character is here, right, A, B, or C, if it's an A, is this string in the language described by A? No. Because it has to end with a single A? Because it has to end with, well, any number of B's or C's followed by a single A. Once we put the B or the C down, the next character has to be an A. Has to be an A. And that has to be the end of the string. Exactly. So that's not in there. A, B, well, that's not in there, right? Any number of A's, the B matches there, the A matches the A, the B's at the end. And the C? Same deal. So A no longer has the potential to match. So now let's look at B. So the first question is, does B match? So B is not in matching Y? Because it doesn't end on a string. Yeah, let's go back. Because for B it has to end in a B. Right, for B to match it has to end in a B. Right? That's what this says. It's the same logic that we just talked about with A. So it doesn't match, but does it have the potential to match? Yes. Yeah. So now what we've seen so far is A, sorry, as I get closer to the bottom here, it makes a frightening difference. So now we have A of length 4, A of B. Okay. Now, which regular expressions do we look at? Do we look at A? No. Exactly. It's not in our potential. So we only look at B. So now we say, does this match? Yes. It matches B. Does it have the potential to match B? No. No. Right? Yeah, we got to the end and we can't extend anymore, right? There's no more potential. Exactly. Yeah, so now we've got to here, right? So what do we say that the token that matches here is B? Yeah, exactly. So this is what we're going to return. So if we call get token on this input, right? It's going to first return B. And it's going to consume the first five symbols of the input string, right? A, A, B, A, B. These are all gone from the input string now. And once we make this decision, this decision is final. So we've gone as far as we can. We don't ever go back and try to reparse and say, oh, maybe we'll get a longer second token if we had taken A here with 4. Maybe that would allow us to parse more. So we only go through that with here. So now these five characters are gone. So what's the string that we're going to be matching? What's the next character we're going to be matching? C. C. Yeah. So I'll just, ah, I was getting all over. OK, I'll just C, right? Which one of these, so does A match? No. Does it have the potential to match? Yes. Yes, right? We have the C here. So any number of C's followed by an A would match A. So we know A is potentially matching. What about B? No. Because it has to start with at least one A. Right, exactly. It has to start with an A, right? So there's no possible way that this, the string C could match B. What about C? Yeah, yeah. Well, it has the potential to match. It has the potential to match, exactly. OK, we don't have any match yet, and we don't have the longest match. Now the string C, C. Let's see, does a C, C doesn't match A? No. No, does it have the potential? Yeah. Yeah, because we have to do any number of C's in here. What about the big C? Yeah. Does it match? Yeah. Does it have the potential to match? Yeah. Yeah. So we have, it matches C, and we still have A, C that are possible. So it's a C, like one. Would it be C, two? Two, yep. OK. We have our next character, C, C, A. OK. C, C, A. So does that match? So we have A and C that we're considering. So let's think about A. Does that match A? Yes. Yeah, so this first A goes to zero. These C's, two of these match the first two C's, and the last A matches here. It's a full match. Is it in the potential? No. No? No? Why? It ended the string. It ended on the end. Got to the end. So no A in there. OK, C. Is this in C? It contains something that isn't to C. Right. It doesn't, yeah, exactly. So it's C, C, A. It can't possibly be in there. And that's that token. Yeah, exactly. Right. So this is the next token that's going to be returned. So we'd first return B, then return big A. So we did this, the first five, then the next three. So what's the last remaining string I have? B-C-A-A. B-C-A-A, yeah. So let's start with B. So which one of these could match B? Is it matching? No. No. It has a potential though, right? No. Yes. What about B? Nope. Nope? Because it has to start with an A. What about C? Nope. So we just have A. Oh, A's not in there. We have A as a potential. So we look at B-C. So we look only at A. So A, does it match? No. No. Does that have a potential to match? Yes. Any combination of B's or C's. Yeah, any combination of B's or C's, right? So B-C is in here. This can go to zero. B and C can go to here. We haven't reached the end of the regular expression, but B and C are both in there. So that's good. All right, B-C-A. Matches A doesn't have the potential to match anymore. All right, that's a good answer. We're looking at A. So this goes to zero. The B and C both match in here to star. And the A and the N matches. So it's a match for A. And we've reached the end of the regular expression. So there's no more potential. We have A. A is like three. And so we're done here parsing tokens here. Funny when you make these. I kind of thought this would be the C. But this is why it's hard to just eyeball it. Probably a good example why you shouldn't try. Exactly. Yes, it's a very good example. Because it's incredibly easy to make mistakes here. Okay, what's our last spring? A, just the A. Does A match A? Oh, no. Oh, yeah. Zero, zero, A. What about B? No. What about C? No. Does that have the potential to match A? No. Yes. This A star means that we could have more A's followed by more B's and C's. We don't know what's after this string. We have A here. In the match, we have A in the potential. The longest match so far is A1. But now we've reached the end of the string. There's nothing at the end. So you just submit tokens. Now we've reached the end. There is no more potential because there's no more potential string to match. So we say what's the longest match so far? Oh, it's a terrible one. So it looks better. So we return A as our last token. Are you completely parsed this input string until you have parsed A1 again? So what exactly would be the syntax for an error? Like if we just writing this, like if we just wound up having just an orphan B, like just a B all by itself, we don't have the syntax to parse that. Correct. What would be the methodology in that case? So we just had a B by itself, right? Does that match here? Well, it could match. But just a B by itself, the single string B, it has to end in an A. This B here has to start with an A. It has to start with an A and end with an A and a B. This has to have at least two C's. So the lexer that you'll be using, pretty much throughout the rest of the course, specifically in the next project, there's two kind of special tokens. One is EOF, what's EOF stands for? Endofile. Yeah, endofile, right? So that's when you've reached the end of the input. There's no more string to parse. So for instance, when we've gotten here, when we've gotten here, if we call getToken again, well, we've parsed all of the input string. There's no more input string. So we would return endofile. I believe the token, so for the other instance, when we have something that's an invalid token, I believe we have an error token. And so depending on your programmer, you have to decide how you handle that. If you just ignore it and try to go on, get rid of that, go on to the next one. If you just quit and tell the programmer, hey, there's an error, that depends. So in saying definitively, I would say it's a safe bet that I would probably not give you something that you're doing by hand or it's going to be an error on purpose. On purpose, yes. So first, I would double check what you're doing to make sure that you didn't make a mistake because that's the unlikely case. The unlikely case would be I accidentally put in an error in there, which is possible, suddenly possible though. But I'm not perfect. So the whole line should token act? Yeah. Unless the instructions specifically say, you know, catch errors. Call get token on this. And if you ever get an invalid input, return an error token and delete that character and then try parsing afterwards. Yeah. So you would never try and re-parse it. You never do that, right? No, so that's the thing, right? So from the perspective of once I decide that this token is the longest match, I throw away that input, and I never change that decision going forward. More questions on this? Yeah. Just for the last thing, it could have potentially mattered B2, right? I mean, it doesn't matter, it's the last one, but for the very last one, it could have potentially mattered B2, right? Oh yeah, that's a very good point. Yeah, the B could match here, so we have some potential. Yeah, you're right, absolutely. I know you're absolutely right. Yeah, it could have matched B as well, but then when we look at the next one, we say... Yeah, we see A as the end of the final. Yeah, that's a good point. And then this is instance, if we had a particular string that where A and B both matched the whole way through, and then we reached the end of it, we would put precedence towards A rather than towards B. Exactly, yeah. So we know from the list here, right? A has more higher precedence than B or C. For instance, the string... Yeah, A, A, B, right? The string, if we had A, A, B, just this string, because that doesn't match A, string B, yeah. But we can't have a string that's both A and B with the right, we've set it up. Yeah. But if it existed. Exactly. So yeah, if you had, I don't know, well, if you had, let's say, A as a... A star A, and B for some reason is an A, A star. I mean, in this case, they're identical, so parsing any input will always give you A. It will always return big A, right? Because they're both going to match both two A's, and because A's are in first, it has higher precedence. Let's go over one quick other example before we continue. A quick example that we kind of went over Friday after class, I believe. So here we have, we're going to use the the regular expressions that we've already defined, num and dot. These will be in decimal. And we don't really need the exact definitions of them. I mean, if we were really doing this, we would. But I just want to quickly go through this to see that when I say there's no backtracking, right? I mean, there's no backtracking when we decide on a longest token. That's a bad decision. But for instance, here, we have the string, we have match, potential, and let's say the longest match. So we're going to first look at just the string 1. Does 1 match a number? Yup. Does it match a dot? Does it match a decimal? Potential. So does it have the potential to match a number? Potential to match a dot? No. No, no possibility. Potential to match a decimal, right? So now we know the longest we've seen so far is num. 1. Okay, it's going to be 1 dot. So now, so does it match num? No. No? Right, that's not a number. Does it match decimal? Yeah. 1 dot. Remember, so just refresh, right? We specifically wrote it, so we have to have it followed by a decimal place, right? So nothing matches. Does num have the possibility of matching? No. Does decimal have the possibility of matching? Yeah. So what's the longest match we've seen so far? Number 1. Yeah, number 1, right? Okay, num dot dot. Does that match decimal? Does it have the potential to match decimal? No. At this point right now, we know that this input is defined in the language defined by a decimal, right? It was, I believe, something like num dot. In essence, num, but that's not exactly right, right? So nothing matches there. So what token are we going to return here? Yeah, num 1, right? So this is the token that we return when we call getInput there. And where do we start parsing from? We can't token again. The next, where are we going to start our lexical analysis? After num 1 ends. Yeah, after num 1 ends, right? Just here. So we're going to start parsing dot, right? So even though we had to look 3 ahead and we said, well, it's possible it could have been decimal, right? We had to look 3 characters ahead to figure out that, hey, decimal actually doesn't match here. So then we go back and say, what's the longest match? That's why we keep track. And so we say, that's what we return. So there is some backtracking, right? So we're backtracking to the last number, the last longest match that we've seen. Yeah. Does it actually backtrack like that? Or does it just go, OK, now let's go back and start the string. We started at, go over x amount. Yeah, yeah. I mean, backtrack is kind of a, I'm using it loosely here, right? So yeah, you try to look at these 3 characters, right? And then you'd say, OK, none of these match. So I need to go back to the, yeah, I would go back to the start, because you know exactly the length, right? So you go back, and then you move one forward. And then you'd say, OK, let's start parsing at dot now. So then you'd say, OK, that matches dot, right? It doesn't match anything else. Dot 1, you return that, and then you do the same thing again with the next dot. You return a dot. So this would return, a get token on this would be, that make sense? So everybody understand the backtracking thing here? We're even going to look ahead, right? We're looking ahead to see if any of the regular expressions are matching. Lexical analysis, right? Now we want to understand, OK, how does, so we've said, OK, now we know about regular expressions. We know how to write regular expressions, or we're going to be learning that a little bit better in our homework. We know how to then write a lexer, or how a lexer, we know how it works. Our lexer takes in those input bytes and transforms them into tokens. So the idea is syntax, yeah, OK, good, sorry. So the idea of syntax analysis is to say, OK, well now we just have a sequence of tokens, right? Num dot dot num decimal dot id dot. But we want to transform that into actually something that's useful that we can compute on, right? This is just a sequence of tokens. So we need to want to extract some more meaning for this. And we need to do something else, right? So what do we know about, I mean, I don't know, we have an input is a series of tokens, right? What do we know about that? Is every sequence of tokens valid in our language? No, no. No, right? So if we have an invalid token, right? That's a big error, right? Yeah, it's going to be, it's going to say, hey, there's an input token here. I don't recognize this unicode character. It's not my language, right, whatever. I'm going to throw an error. But we want some way to check and specify if a sequence of tokens is valid in our program, right? For instance, is this, would this be valid? Yeah, probably. If plus is a token, right? We're obviously defining a, I don't know, we haven't defined, I guess the correct answer is, well, it depends on what we're talking about. But intuitively in most program languages, a number plus a number is a valid sequence of tokens. What about like decimal dot no? No? It could be inside of a string. Well, if it was inside of a string, the token would be string. Yeah, exactly. No. So no. Although what about anybody programming Ruby? Yeah? So what's the interesting, one of the interesting features, what are some interesting features about Ruby that may be relevant here? It's got a cool name. It's not really relevant. So the one of the defining things, to me one of the defining things is that everything is an object. And in Ruby when we say everything is an object, we mean everything. Like the number one is an object that defines the integer one. So you can actually call methods and treat one like an object. So you can write Ruby code that's the literal one dot some function. You can say one dot times, you can say one dot up two, they'll do the loop. So you can think of this as maybe, could be a weird language that's defining like you can, I don't know, have stuff on numbers, but really it's really terrible, right? Because decimal followed by a dot followed by a number. You can think about how that would look like 10 dot 10 dot 10, right? Kind of maybe you're trying to do an IP address or something. Yeah, or maybe a software version number. What about ID dot ID? Could that be valid? Yeah, just be like an object call, right? So like an identifier dot an identifier. So you want to call some method or send a message to the first object. That could be crazy, right? Because it doesn't really matter. So at this point we've just defined tokens, right? So now we have this series of tokens and we want to know, okay, are these things actually valid? Okay, so we've learned about regular expressions, right? So let's use regular expressions to do this. I already told you, right? They're fast, they're easy. Are they easy-ish to understand or hopefully you're starting to develop some intuition about looking at a regular expression and what it means? So let's use regular expressions, right? So we'll say a program is made up of zero or more statements, right? More or less, yeah. And a statement can be an expression, an if statement, a while statement, any number of different kinds of statements in your program. We'll define operators as plus, minus, multiply, divide. We'll say an expression is either a number and I'd be a decimal followed by an operator followed by a number, ID, or decimal, right? Is that how you think of an expression? A mathematical expression in a program, right? You want to perform some computation on two numbers. So does this match? So the five is going to be which token? Num. Num plus is going to be the plus token, the operator token, yeah. Ten is going to be another number, right? So yeah, this matches regular expressions. So we turn this into tokens. Let me say to do the tokens match. What about two minus bar? Yeah, yeah. ID minus ID, that's definitely in there. What about this? No. Yeah, so right, so if we look at this, so a program is a series of statements, right? So one possible statement is an expression. So does one plus two match expression? Yeah, that matches. But for this to be a program, it has to be a zero or more expression, right? So the next thing after it must be another expression. Is plus three an expression? Well, yeah, so we have one. The plus is what makes it a method. The plus is where it stops. Exactly, so the first two, one plus two is an expression that matches this regular expression. But statement, program is multiple statements. So this is a statement. Is this a statement? No. Exactly, yeah. Okay, so it turns out that regular expressions are not sufficient. They're not actually expressive enough. If you think of the entire set of languages that could be described by regular expressions, they actually aren't powerful enough to express this fact that, hey, we want an expression should be, well, an expression, an operator and another expression, right? So you can have some expressions and you can have messing like this, which is how we think about it. I'll guarantee you, we're not going to go into the details in this class, so you can prove this to all kinds of cool stuff. People who've taken 355, do you do that in 355? Yes, good. Okay, so I don't want to spoil anything. So then when we come back, we'll look at how to write a regular expression for matching parentheses, right? A very simple operation that we might want to do. So we'll stop here, but before everybody gets ready to go, I will say, so I teach another class at 10.30. I notice there's nobody in this room, so if you guys want to, I don't know, I can do like an office hours thing in here for about 20 minutes or so from like 10.50 to, what is it, 9.50 to 10.10 or so. So if you want to do that and you want to do stuff, just let me know. But if you're not going to be here, I'll record it and put it online, so you don't have to, like, be there as well.