 So I have no idea how this recording is going to sound. And I also don't have my drive pad. So it'll be like my finger. So we'll see how that goes. Cool. Any questions before we start? We'll do like two minutes for questions. Homework questions, project questions. Yeah. I guess I'm a little bit. Cool. Yeah, that's a good question. So let's go back to regular expression syntax, right? So the seventh rule we have here, right? We have R star, right? So this means whatever's in here, if you think of it as an operator, the star operator is going to apply to whatever regular expression is in there. So that's also an issue of precedence that we didn't really talk about. We talked about precedence of A bar B, which is very small that you can't read, right? We talked about A bar B dot C, right? So which one? What is this? Where would the parentheses be on this example? B dot C, yeah, exactly. A or B dot C, right? So we're going to also define the problem is we didn't say, what does this mean, right? So we're going to say the star operator has the highest precedence, kind of like, if you think of it, exponentiation, right? So if I have x squared, which I can't really do here. Let's see if I can draw it. We're going to see how this goes. x squared, wow, this is going to be a fun class. It kind of looks like a 2 if you move your head in the right way, right? So we have x squared. So if I had something like y plus x to the third, right? It's clear from our years of math training that this, what's being cubed is not y plus x, what's being cubed is x. Similarly, if I had y plus x to the third here, now it's being cubed, y plus x, thank you. Get us out of that awkward silence, right? So we're going to do this exact same thing here with regular expressions, right? So using that same logic and that same intuition, what would this mean? What's being starred, if you will? Just b, exactly. So this would be different than, so here, what's the star operator applied to? The regular expression a bar b, exactly. Cool, does that answer your question? Awesome. Any other regular expression stuff that we talked about, any of the other operators? So I guess we can briefly mention, right? It was this way in the homework. So we're going to get rid of the dot operator in most of the things that we talked about. So we can write this. This is the exact same thing as saying a bar b dot c, right? So that's why on the second part of the homework, it says we omit the dot operator. So we're not going to deal with the dot operator. And this just makes it easier to write regular expressions. So we can write the regular expression a b c. So this would be a dot b dot c. On the homework, you can think about it from our perspective, right? So you give us the wrong answer. How many points can you be expected to get? Zero, right? You just wrote down something maybe arbitrarily. I don't know how you got that. But if you show us your work, not only can we help you give you better feedback because we can identify exactly where you went wrong, we can give you some partial credit because it's clear that you went through and tried to do everything. And maybe we say, oh, you made a mistake here, but everything else after that was still right, even though you got the wrong answer. So it's really more about you, show your work. It's better for you, it's better for us, better for everyone involved. Like true, false, you probably would not need to unless you really want to. I mean, the homeworks are really midterm practice that you can get some points for. So that's the point of the homeworks. And these are all questions that I actually drew from previous midterms and putting them in homeworks. Any other questions? Cool. All right, so we covered this. So now we go back and actually define the star operator because we have already done this. Okay, so I looked this up on Wikipedia. I believe his name is pronounced cleanie. He was actually, so I'll probably refer to him much, much later towards the end of the semester when everybody's already forgotten what I'm saying right now because he was the student of the guy who invented lambda calculus. So we kind of go full circle here. So cleanie, star, he used the guy who defined the star operator. So we already kind of built up an intuition on Friday about what does a star operator mean? So what was that intuition? How would we describe what this operator does in English? Yeah, zero or more repetitions of what? Of r, yeah, exactly, right? So intuitively what this is when we say zero or more, we're saying that L r star is equal to the empty set, the second taining of the empty set. So what's the L r star here? What am I trying to define? A language, yes, so we're defining a set of strings, right? So we're saying what is the language that is described by the regular expression r star, right? So zero or more times, what's gonna be in that set? Zero repetitions. Epsilon, yeah, so we're gonna be epsilon and we're gonna union that with the language described by just r, so that would be one repetition. Two repetitions would be L r concatenated with L r. Union that set with L r concatenated with L r concatenated with L r and then union that with and so on and so forth until two infinity, right? So that's intuitively what we wanna do here. So what we're gonna do is define this as, we're gonna define a new kind of function on languages if you will, or a relation here. We're gonna say L zero of r is the set containing epsilon. And so we're gonna say for any given L i, this is the one that came before it, right? L of i minus one concatenated with L r. So does everybody agree with this? Let L two would be L r concatenated with L r and then concatenated with, well, a set containing epsilon, but what's that gonna be? Yeah, empty string, right? When we concatenate with the empty string, we don't have to do that. And so we can define the L r star as the union for all i greater than or equal to zero of L i r, right? So should this change the way you think about the r star operator pretty much? Shakes your thoughts to its core. The intuition is still the same. The intuition of zero or more repetitions is the same, right? But we can break it down mathematically and say exactly what we mean here. Notice that this doesn't say anything about what, so this is all possible two-character strings that made up that match r, right? That's what this L r dot L r is, right? So when we look at three-character strings, like L r, L r, L r, right? This is going to be all three strings, all strings that match the regular expression concatenated together. And so it doesn't matter whatever matched in here, matched in here, matched in here, it's all possible combinations are valid. Okay, so some examples, right? You've seen this, actually this was our previous example, so this is a good way into preference, right? Or operator preference. So here we have the language described by the regular expression A bar B star, right? A or B star. And so intuitively what strings should be in this resulting set? A followed by any number of Bs? So is epsilon going to be in there? It's awesome, it's like waves of answers crashing. So somebody say yes, why? Okay, somebody want to raise their hand? Yeah, in the front. Right, so it's A or B star, right? So that's or would be B star, and B is zero or more times as epsilon, right? The zero time is epsilon. Is A going to be in there? Yes, from the other or. What about A, A? No. What about A, B? No? Exactly, they're not concatenating. So describe this in English, what is this in English? Yeah. This is. Yeah, those describes exactly. Epsilon, A, the string A, and a string containing any number of Bs. The singular string. I actually don't want to answer. Huh, say that again. All strings containing all Bs, everything, yes. Cool. So we have A, epsilon, B, B, B, B, B, B, B, B, B, B, B. All number of Bs. What about this, what's the difference here? What does this mean in English? All strings containing A and B, yeah, so it's the, there's also multiple ways to describe Bs, right? But we have zero or more what? As and Bs, yeah, exactly, it's A. We have any number of, what's inside there? A or B, right, so it's gonna be, is epsilon gonna be in the set? Yes. Yeah, epsilon's gonna be in there, what about A? What about B? A, A? A, B? B, A? So how do we figure this out? A or B, A or B stuff. Yeah, so we want the language, yeah, so this is gonna be the union for all I, I'm not gonna draw that for all I think. The union of L, let's see, L, I wonder if I can do superscript. Big or smaller, that doesn't really help. Bold, all right, L, I, of what? What's inside here, though? Because this is what's inside, this is what's being starred, right? The regular expression is R star where R is A or B, right? And so this is going to be the language, well first it's gonna be, we know, draw epsilon, that'd be tricky. Union with A or B, union with L, A or B, concatenated with the language described by A or B, union with so on and so forth, right? So what's gonna be this set? A and B, yeah, the second containing A and the second containing B. What about in this set? A, A, A, B, B, A, B, B, is that all of them? Yeah, right, and this set will be all three letter strings that are all composed of A or B, right? So now let's go back to whatever the original question was. What was the original question, does that remember, it's like B, B or B, A. So is B, A in this? Yeah, questions on this? So once again, it doesn't mean any number of A's, right, it's not specifying zero or more A's, it's specifying zero or more A or B's. That's why it can be A, B, A, B, A, B, A, B, every possible string that has composed solely of A's and B's is gonna be in this set, right? So this is exactly what we did, right? We kind of broke it down and developed an intuition of what was happening. And so now we know all the strings that are in here. So if I gave you a string and said, is it in the language described by A or B star? You could say whether it is or is not, right? So what does this mean? Yeah, so zero or more, zero or more A's. So we think about just this, what strings are in here? A, epsilon, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A. So all those strings concatenated with all number of A's, it's gonna be the same set of strings, right? Because any more A's you add on there, that set is already gonna be inside this set. Is there what, sorry, say that again? Sorry, say that again? It's like a loop, right? It's like a super simple loop that only goes from zero to infinity, right? And so if you put that loop in another loop that just goes zero to infinity, and because it's all sets, they're already, already in there. Right, does that make sense? So I can think about what's inside here. This is what's inside the language defined described by A star is gonna be a set containing epsilon, I'll grab the epsilon, A, A, A, A, A, A, A, A, A, A, A, A. Great, a string with a bunch of A's, every string that possibly has A's. So now if I said what is A star concatenated with A star? So is epsilon gonna be in there? Yes, we're gonna have epsilon. Is A gonna be in there? From what, though? Yeah, because there's gonna be an A in A star and there'll be an epsilon in A star. When you concatenate those together, you will get A. No, A concatenated with epsilon is A, right? Because you have a string, you're concatenating it with the empty string. So it's the original string again, right? And so even if we're concatenating like this and this, that string is already inside this set because this set contains every string that has every possible length of A's. So yes, right? So you can kind of informally show this, yeah. You can put a finite number of A's. So you can do A, A dot A dot A. So, let's see, we want A, A, A. Where, how long do you want me to go? The star means zero to infinitely many times. So this is why the one way you could do that is to have a fixed number of, you know, you could specify exactly, you want all strings of A's from zero to 10, right? You could specify that in a regular expression, but you have to specify every possible string. Or you can maybe do it a little bit more succinctly, but you can't say I want to do A star but only from zero to 10. Yeah, that's all, there's epsilons in there when they're supposed to be epsilons. It's just because I don't wanna keep drawing up. I could copy, I should probably copy the epsilon from my slides that I could use in here, but yeah, for right now, no, those are all characters. You'll see an epsilon when there's an epsilon, don't worry. You can simplify, what? Yeah, if we said A or B or C, and like, these are all starred, and this whole thing is starred, maybe, you tell me. It's part of understanding regular expressions. So can you make an incorrect simplification? No, right? Like, let's look at, you can't say, so is the, well, that's fine. Is the language described by A star or B star? So what's this? What does this describe in English? Any amount of A's or any amount of B's? So is that the same thing as the language described by A or B star? Name a string that's in one of the sets but not in the other set. A, B, right? The string A, B is in this, right? This is any number of A's or B's. This is all strings containing all A's and the strings containing all B's, but not A and B combinations, right? So this is why you gotta be careful when thinking about simplifying it. So if you simplify it and then you mess it up, well then everything's bad. Okay, so now we've developed a little bit of some intuition and some formalism around regular expressions and you're gonna continue to develop this in your homeworks. And I'm gonna see if I can, yeah. Well, that's a weird one. Okay, apparently I don't understand how this desk works. Is the whole thing moving up? Okay, it started tilting this way so I don't wanna ruin this desk. Okay, cool. So we've been studying regular expressions, you're gonna study them in your homework and now we want to say, okay, we were using regular expressions to be able to define tokens, right? This was the entire purpose of looking at and studying regular expressions is in the context of flexible analysis, we want to create tokens from regular expressions. So we have, so what we're gonna do is we're gonna define the tokens using regular expressions. So let's do an exercise. So I wanted to find a regular expression called letter. What would be in that regular expression? What types of strings would I want to match? Yeah, like a letter. Yeah, so I need to specify exactly which characters I want, right? So A or B or C or D or E or F, G, H. I hope it'll mess this up, okay. Right, all the way there. So what strings do I not want to be in here? Epsilon, we wouldn't want an empty string to be considered a letter. Numbers, right? So this is part of when you're writing a regular expression, you want to think not only what should match, but what shouldn't match so you could properly validate that. So this is part of actually creating test cases like you're doing in project one, you need to think about doing test cases even for regular expressions, right? What about uppercase letters? Would you like a language that you could only write in lowercase? No, somebody can say yes. And then back it up with a fact and opinion. More stuff to work with and deal with. So what are the pros and cons from like a language design perspective? What is it, what, say it again? Oh, caps are aggressive. Oh, I like that actually. So you don't code like this, like you don't do this when you code? Maybe in comments, why? Okay, right? Yeah, so it's a little rude using all caps. What else, special characters? Okay, we'll get to that in a second, but focusing just on uppercase and lowcase, it seems really pedantic to focus on such a tiny thing, right, but have you ever used a language where there's functions called N there's a parse or a, and you have autocorrect doing weird things, right? So is it kind of weird that you can have a language? So are these the same functions? Are they different functions? I mean, it depends on your language, right? This is part of that design decision. So you could maybe think like only allowing lowercase characters, that could be a nice way to get rid of this problem, right? All functions are all lowercase. So if you look at the C standard library, I think every function name is, I wanna say eight characters or less and also the file names because that was the limits of the Unix system that they were developing in at the time. So we still have that baggage now when you're dealing with C functions and they're all lowercase. So even you guys, the Mac file system does not actually do uppercase and lowercase. There's no difference, all files are considered one case. So you can open a file with two different things and it'll be the same file, it's crazy. I can't believe the Dropbox people had to like deal with that when they're doing file syncing, but anyway, so this is something to think about, right? So we're defining a language. These tiny things that we think about now about defining tokens for letters, that's gonna affect everything, like how the programs look and how things feel as we're going on. Okay, so we have letters, those are cool. We like letters. What about, do we ever wanna use digits, like numbers in our programming language? Yeah, what kind, do you guys wanna use Roman numerals? Oh, that'd be a great, oh, I shouldn't have said that. That'd be a great midterm question. Okay, so what would a digit be all the way to nine? Is that good? So then, how can we define a number? So what's the problem with just digit star? Epsilon, right? We would match the empty string. Is the empty string a number? No, what base of number are we talking about? It says who, is this a number? It could be, yeah, and hex it is, right? So what kind of, okay, so first let's define what kind of number system do we wanna define? Base 10, thank you, okay, good. Right, so we can't do just digit star. One or more of, no, we can't use that operator. Yeah, I like the concatenation this way. Digit dot digit star, right? So this is all strings consisting of all digits. So do I want this to be a number in my programming language? You tell me, we're defining the language now. If it's not, you don't wanna use this foreman's language. So in C, are these the same things? Actually, maybe I did this wrong. Okay, what about now? Did that change things? So what kind of, what are the kinds of different notations and base numbers that you've learned about in your career? Binary, how do you do binary in C? Can you do that? Like as a literal, I think, I think if you put either the B in front of it or the B afterwards, I can't remember, but there is a way to write binary. Do you know? Zero B, oh, zero B at the start. Got it, that makes sense. So this would be a binary number, which can be really useful, right? You're dealing with crazy systems. You may need the raw binary signature. What else? What are some other base systems? Octol, what? So what's octol? Base eight, how do you do octol in C? Zero X? Leading zero. So what's zero X for? X, right? And so octol is a leading zero. So then are these the same number? No. So should we have regular expressions and tokens that match them as if they were the same? No. No, probably not, right? That doesn't make sense. We'd want to maybe have different tokens. So let's go back to our number. So now we need to, we already said we're doing base 10 numbers, right? So do we want leading zeros? No, so how do we do it? Remove zero, like minus zero. Yeah, so why don't we, we can define, just like this, we'd develop the shorthand, right? We can call it, I don't know, p-digit, or we can not, there we go, right? Every digit besides zero. So then now what would our number be? p-digit star? So we want to take a crack at it because there's a lot of you, let's see, p-digit.digit star. So then what does this match? So let's think about what things match, what things don't match. Does it match this? Why not? Right, zero's not a p-digit. So this defines all strings that start with a p-digit and have any number of zero through nine after it. What was that? Or zero, why, or zero, what test cases does that fail? Only zero, should this be a number? Yes. Yeah, does this, is this inside this, the language described by this regular expression? No, because we said that it has to start with a p-digit. Right, so how can I fix this? What do you just say? Or zero, ah, so what would happen if we did that? So let's add, since there was confusion, I will, I think it got worse. Oh, it's in italics too, that's not fun, okay. And can you just go back to the way everything is? It's not super specific program. Sure, this is exactly what I wanted to do. Okay, so we just changed this and it took five minutes, but we added epsilon to p-digit. So now does this token capture what we want to capture as a number? Why not? So is all these zeros in this language? Yes, because this would be epsilon and then it'd be any number of zeros. Exactly, cool. So should we keep this in here? No, we should not. Okay, so now are we good? We don't have negatives. Okay, we'll deal with just natural numbers right now. It's gonna complicate things. Okay, okay, cool. So we did all those, we did numbers. Oh, cool, okay, now let's go back, let's do, so we did letters, right? Now we want to say what is an ID? So ID is short here for identifier. So what kind of, what are things that are identifiers in our programming languages? Variable names, what else? What was that? Statements, statements, use identifiers, yeah, to perform operations on them. Variable names, what else? Function names, now these are all like identifiers. You want to give a name to these things. Constructors, functions, although usually constructors are like a keyword, but yeah, right? Maybe any kind of like, what was that? Oh, structure, yeah, like a struct. Yeah, you want to name a struct, naming classes, class names, constants, enums, right? All kinds of stuff. And you need to know what is an identifier? What does an ID look like? And so what are some of the rules for identifiers in programming languages that you're used to? So raise your hand, just give us like, I don't know. The general rules maybe that you remember and what language, yeah. Can't start with a number, why? Because then how would it know if it's a digit or a number, right? It may not be able to tell. So that could be one reason why. I don't know whether that's specifically why. Yeah, a lot of languages, you can't start a variable name with a number. Yeah, so you can't use a keyword or if or while, think about it if you could define that to be a function. Yeah, go again? No spaces, why is that important? Yeah, exactly. How do you know when one ID ends and another one begins, right? That would be madness. Underscore, yeah, you may want an underscore. So underscore kind of actually gets around that fact. You can't have spaces, but you may actually want to separate words visually so you can use an underscore for that or you can do camel casing, right? That's another way of doing that. What else, any other weird rules? Yeah. Does it start with a number? Yeah, so usually you can't start with a special character either. So usually you have to start with what? A letter, right, as we've defined it. Can you start with what? You could maybe also start with an underscore. Yeah, sure, let's add this in here. I like that. So then what can follow an identifier? So what is it, like letters, like letters or what? Digits, underscores. What special characters though? Can I do this? So actually one of the things that maybe you've never thought about but I've actually found so some languages actually allow you to use a question mark inside of a variable name, which it seems silly, but if you have a variable called maybe is admin, right? It's actually kind of nice to read because it reads just like a question, right? It's a Boolean. Is it an admin? If it's true, the person's an admin, if it's false, it's false. It's actually super handy. Some languages do this and it makes the code a little bit, just that little bit more legible and readable. Now would you want to do like is admin and is, I don't know, on phone? Probably not, this would be super annoying, right? Okay, so letter, digit, underscore, any other special characters? What a hyphen, depends on the language. Some languages use hyphens, some don't. A lot of them don't. That actually seems kind of limiting, right? We can think about maybe Unicode characters and all that. So is this it? Am I done? I want a star where? Like this? Yeah. Cool, so does this get across what we want? So this is basically everything that we did. This is just slightly different than what we've seen before. We just didn't add the underscore, but otherwise it's exactly the same. And this is reiterating the fact that we left out the dot operator here, but we still know exactly what this means. So you're free to use this syntax on homeworks, midterms, leaving out the dot operator. It makes things a little bit more clear if you include it, but you don't have to. And so we have cool things. Like we can say, yes, this is an ID, this is not an ID. Okay, so now we've got our number down, right? Close clarify. So we already saw this was wrong, we saw this was wrong. We added p digits. So you guys already did all this stuff. I don't need to teach it, teach yourselves. Okay, we needed the zero. So now how do we define a decimal number? So let's think about first, let's go what are some decimal valid decimals? So we want what, 10 dot 10? Is that valid? What are some other examples? 0.5? 0.5? We're getting off track. Anybody want to add an example? That's not one of these? 0.05, okay, interesting. And I just did not put them in the right place. All right, we're gonna stop. Not gonna deal with scientific notation. That would be a completely different issue. Yes. Cool. So how do we write what is a decimal? Is this correct? What does this regular expression mean based on everything we know so far? A number followed by a number. Is it two digits though? What's the one element that's in num concatenated with num that is not in num? Zero, zero, right? So this is not the same thing as num because we have zero concatenated with zero and we already said we specifically don't want the number zero, zero in our numbers. Okay. So we've hit a point where we want to use an operator. We want to use the period character but not as a regular expression character. So just like when you're writing a string in C or Python or C, Java, whatever, and using double quotes, if you wanna actually include double quotes in that string, what do you do? A backslash, yeah, you need an escape character to tell Java or C that, hey, here I'm not talking about that special character, double quote, which stops the string, I'm talking about a literal double quote character. So we're gonna use the same thing here, we're gonna use the backslash, just say backslash period matches the period character. Concatenated with, so here we'd have concatenation on both sides, right? So does this get us what we want? Yeah, you can always put things in parentheses. You have unlimited parentheses, do what you want but you can't go like this, that's not a valid regular expression. Okay, so let's self-test, right? So we first came up with some cases, but what do we do with these cases? Are these, what types of test cases are these? Positive, what does that mean, positive, that they're super happy? No, these are all strings that we know should be in the language described by decimal, right? We actually didn't come up with any test cases that shouldn't be in decimal, right? But let's deal with that in a second, so let's walk through. Is 10.10 in num, or in decimal? So is it in num? No, is it in decimal? Yes, we have a number followed by a period followed by a number, great. What about this? What about this? No, why not? So what do we wanna do with our language? Do we wanna accept this or do we wanna change this? Let's change this. Always change the requirements, it's way easier than changing what you already did. So we'll say all decimals in our language have to start with a zero, right? They can't be any .5. We always want it to be a zero there. It's actually more clear that way, right? It's not clear whether you meant to type in 1.5 and forgot the one. You actually are explicitly telling us you want 0.5, yeah. Yes, yes, you can do that. Okay, what about this, 0.5? Yes, already in 3.4. This is in here? What about this? Why not? Is 0.5 a number? So then what are we gonna do? Do I just change it like that? If I change it like this, is that a decimal? Do I want it to be a decimal? No, by the same logic as requiring the leading, right? I should have them explicitly type in something here. Do I also want this? Ooh, that would be a good thing. Yeah, if you're doing a scientific programming language where the significance of digits matters, maybe you do want this. Well, let's say we don't. Because the computer only has 32, 64 bits, right? So these are exactly gonna be represented the same way, or possibly, I don't know if that depends. So how can we fix this? Epsilon? Exactly. Also, we don't just want p-digit star, right? Because we want to have leading zeros, right? Because what about, but we do want this, right? So we want a number or what? It's kind of easier if we say, okay, the number does a lot of what we want, right? We didn't say anything that we have to write the smallest possible regular expression, right? So number or what? We could say num. Or we could even say if we want to be a little more specific, we also may want zero or zero star. All right, think about it.