 I've given up on my fight with the projector, still not 100% working where I want it, but we will continue going, and everybody can see everything, right? So, step one, yes. All right, I do a couple of minutes of questions, any questions that I can answer? Yes? The homework will be released later today. What does it do? Next Friday, whatever's on the website. All homework submission will be through Blackboard, so there'll be that on the website. Also, at the link, there'll be very easy content. You'll be able to download it, upload homework. All right, let's keep going with regular expressions. Worried that I'm gonna press the wrong button, so let's safely do it like this. Awesome. Okay, so on Wednesday, we were talking about the syntax of regular expressions, right? You need to somehow define specifically the syntax of what is and is not a regular expression. So why are we talking about regular expressions in the first place? So we can do lexical analysis. That's the name of the subject. Anybody more specifically? What do we want to use regular expressions to do in the goal of lexical analysis? Yeah, in the back. Syntax, can you be a little more specific? Yeah. Specified tokens? Specified tokens, yeah. So we're using this in the, so yes, we're doing lexical analysis, and then we want to do syntax. We're gonna go talk about parsing or anything like that, but we need to check and define syntax. And in order to do that, we're defining what tokens are. And we're using regular expressions to define tokens. Awesome. So I'm gonna briefly go over this, just so we're all still on the same page. So regular expression, empty set, epsilon, A, where A is an element of some alphabet. R1 bar R2, where R1 and R2 are regular expressions, R1 dot R2, parentheses around a regular expression is also a regular expression, and R star is a regular expression. So, using these rules, could you generate a bunch of regular expressions for a given alphabet? Yes. Or a given alphabet. Are you a given alphabet? It says to second hand A and B, could you just generate a bunch of strings that are valid regular expressions by using these rules? Yeah. Yeah? All right. I'm still a little reviewed because I want to make sure, because now we're gonna get into the semantics of what regular expressions actually are. And so to get there, I want to first go through some examples. So we're gonna play a game called regular expression or not, but I'm not gonna bother writing it. Okay. So A star dot B or C. Is this a valid regular expression? Yes. Yeah. Where's that? Who said that? Yeah. What's the alphabet? So you guys want A and C, or just A and B? A, B and C, please, because we did A, B last time. A little variety with the spice of life. Okay. Cool. A star dot B or C. Is this a valid regular expression? Yes. But how do you know? How do you tell? Yeah. Check the rules. Right? And one way to think about it is, can we apply each of these rules to generate this string? Right? We just talked about everybody agree that we could apply every one of these rules and we could generate all possible regular expressions based on all the combinations that we use these rules in. Right? So the question, if you were to make C, E, right, that would still be a valid regular expression. E or epsilon? Yeah, epsilon. Is this a valid regular expression? Yes. Is epsilon an alphabet? No. Yes. Yes. Is it a regular expression at all? Yeah. Epsilon is a special character that's defined as part of our regular expression. Right? Just like the bar and the dot character. It does not need to be in our alphabet to be considered a valid regular expression. But isn't it the string containing no... We're not, we're not, we're not worried about strings or anything. We're going to get that today, especially. Cool. So, right, we can apply these rules. So which of these rules, so let's say we're hypothetically choosing one of these rules first to apply. Which one can we apply to get this regular expression here? So we start with number seven. Number seven? For A star. A star? Yeah. So wait, let's go down this fast first. So, I'm going to first flag four, right, which is going to give me, well, I have some R star, right? So then what's this R? A. A. And is A is valid from this rule three, right? So we'll have A star. Can I apply any other rules? I'm going to hit a dead end, right? Because I chose this, let's not go before, sorry, let me say the rule seven, right? I chose this rule first, which means now it's going to be R star. So we say R is a regular expression, right? It could be any one of these rules, right? It could be epsilon, it could be R1, R2, but here he shows A. So this doesn't actually get us where I want to go. So let's say we choose rule four. We choose rule four first, which says R1, R2, right? In this case, what would be R2? C. C by rule one. Three? What about R1? That's A star dot B. So how do we get to A star dot B? That's five. Is A star dot B one of these rules? Yeah, so we want to apply rule five, which would give us what? R, we'll run out of R's, I'm going to use different ones. R3 dot R4. And then what does this mean R3 must be? Seven. Seven, yeah, now we get to the full seven, perfect. So we get to here, this means it's some R5 star, and then we do this one more time, and this becomes, R5 becomes A. And so what about R4? It's just B. Yeah, by rule three, that's just going to be B. So we have kind of looking at all the characters here. We have, well, we missed that guy a bit. It's a little lonely. Those are string, right? So we generated this string by applying all these rules, and we get this really cool looking, I know you're so impressed by my graphic and drawing abilities. You're like, why didn't you become an artist? So is that like a snake or something? Yeah, just all of a sudden. Okay, cool. So now we know how to evaluate whether a string is or is not in there, right? Is it or is it not a regular expression? By applying these rules. So this is just a yes or no, we can look at a string and say, is this a valid regular expression given a language? There'll be homework on this. And this is important because now we're going to talk about what regular expressions mean. So if we can't understand what they look like and what are valid regular expressions, how do we ever give meaning to any of these strings? Okay, so we're going to talk about regular expressions and we'll talk about them. We're going to say if they define a language. What did we say a language was? A set of strings? A set of any strings? Yeah, so a set of strings that all belong that are, so L, thanks to that. Oh, that's awesome. Okay, cool. I'm excited about that. Right? So we have a signal here. We have ABC, right? And what do we say sigma star is? Like what would be some strings that are in sigma star? Yeah. A-B? A-B? A-C? A-B-C? And so on and so forth, right? So we're going to find a language L. We already said a language L is going to be a subset of sigma star. Right? So it's sum. So what does that tell us about the strings in L? No, we do not say whether it's finite or infinite. Say it again. Syntax. It's not necessarily a particular grammar. What is it that's special about all these strings in here? Say it again. It defines. It defines a lot. It has hot meanings. So sigma star doesn't actually have any meaning, right? It just contains what? Every string that's made from what? Every, isn't it every word that contains the L? Exactly. Every possible string using these three characters, right? Every possible combination of these three characters, infinitely. So then what does it tell us about the strings in L? It's going to be defined by an expression, yeah. We need to show that first. Well, they have to have some sort of... So let's think about this way. Is the string, is that going to be an element of some language L? No. No. No, why? It's not an element of sigma. Exactly. It's not sigma star, therefore it can't be in L. So one thing we can tell right away, every string in the language has to be drawn from our alphabet, right? It makes sense. Every word that we can possibly say in English, every English word comes from the alphabet of English characters, right? Cool. Okay. So now we're going to define the regular expression. So we're going to say let's say we have some regular expression R. We're going to define the language defined by R is going to be the set of all strings that match that regular expression. And so this is what we mean when we say, hey, this regular expression either... So when we say does this regular expression match a string? We're saying is that string in this set L, the language described by this regular expression? So we're going to look at how to actually do this, but this is kind of the intuition here. So you can think like sigma star is this whole crazy, random universe of possible strings, right? It's all possible combinations. And L is some subset that's been there. And so you're saying all those strings that are in that subset match this regular expression R. Everything that's outside does not match R. So it's an analogy like the A, like four A's in a row concatenated is in sigma star for the alphabet of the English language, but it is not part of the English language. Exactly. Yes. So we have... In case I didn't get it recorded. Yes. So like the... You can combine any word sequence of characters you can make in English, right? Like four A's. Actually, I don't know. Maybe it'd be really good to scrabble. Is that actual word? Ah. The sound you make when you stub your toe or something. H, H, H. Anyway, so yeah. So the question is, is that in the language? Is that in English? No, right? The whole combination of characters is not a word in English. And that's what we're going to precisely define here with regular expressions. So we're going to define, right? We want to define what the language of a regular expression is. So how are we going to do that? What do we have as definitions of regular expressions? The rules of what a regular expression looks like. How many... So it's probably a safe bet to say we'll have seven rules that define the languages described by regular expressions, right? So I want you to... What I'm trying to get you to think about is this isn't like magic stuff that people... Like I just stood up here for some like I proved this, that this is how you define languages, right? It actually follows from our description of the syntax. We're going to define exactly using those same syntax rules, what it means. So, okay, we're going to say that the language described by the empty set is the empty set. So rule one, done. Easy, right? So the language described by the empty string is the set containing the empty string. So what's the difference between these two sets? One has something in it and one does not. Yeah, one has something in it and one does not. So you think about the cardinality, right? The cardinality of the empty set is zero but there's nothing inside of it. The cardinality of this set is one. It has something inside of it. What is that something? The empty string. The empty string, yes, exactly. And this is why when we went over the strings, right? Of why it's so important that we have, I try to make the difference between this is a string, right? It's blue and it's Alex. And this is a regular string, right? So the language described by the regular expression of epsilon is the set containing the empty string. So the empty set was just zero. Yeah, there's nothing in it. Yes, just two braces. Yeah, two braces right next to each other. So removing epsilon from here would be the empty set. But it's not the same thing. Just like the set containing the empty set is actually cardinality one. Yes. So it's kind of, you want to think of it like this way, you kind of can. Because this is essentially kind of the same stuff. Could you write that? It's like, oh, it's just too... Yeah, yeah, it's definitely like, you could say empty set, this is just shorthand for this. Cool, anything else? So I have a question. Yes. The empty string, so the empty set is a subset of all sets. Is the empty string a subset of all strings? If you were to... Strings have order. I don't think there's a separation there. Oh, that's the problem though. Yeah. If you were to train that, do you specifically search for an empty string? Hold on to that, because we're going... Now I don't know. Wait until we see all the rules. We'll see if that falls out from these rules. Because we're using this differently than a programming language kind of does. We're precisely defining what we need here. I can give you intuitions across what all these things mean. We're going to define them precisely so you know exactly what they mean. Cool. Okay. So, what's the next rule? So it's an alphabet. So what would the language define by A where A is an alphabet? What would you expect that to be based on what we just saw? The second containing A. Yeah, the second containing A. And remember, right, this is a regular expression, I'm going to call it. And this is a string. The string containing the character A. So, if we're a regular expression of A, what string does it match? Well, let's go B. If we're a regular expression of B, what strings does it match? Just the string B, not any string containing B, is exactly B. That's what we're defining here. Right? The language defined by the regular expression A is only the set containing A. Right? So think about, remember that huge subset of a signal star? We're talking about just one string in that infinite set. Okay. So the next one is the bar. I can't cheat. I can't know what's next thing. Okay, cool. So we have, we just find that as R1 bar R2 is the syntax, right? So we have what is the language defined by R1 bar R2? So what if I told you that the bar means or as we think about it? Then what would you say this is equal to? The union. Yeah. Right? But the union of what? Is it going to be R1 union R2? Yeah. Think about the types. This actually helps me a lot whenever I think about mathematical, formal notation. So think about the types. Right? R1 and R2, what type are they? Regular expressions. Regular expressions. Right? Have we ever unioned a regular expression? Have we defined union across regular expressions? No. No. That means we can't do this. We can never do this. I think of L here kind of as a function that takes in what? Regular expression. Regular expression in terms of what? A set of what? Strings. Strings. Right? So we can't do this. We can't do R1 union R2, but what can we do? So it's the set of all strings containing R1 and R2? Or R2. Something like that. Something like that. Yeah. So we have the language of R1, right? Which is going to give us what? The set of all strings that are in R1 union with the set of all strings that are in R2. Formal definition. Okay. So can we do an example where we have the language defined by A or B? Yes. Okay. So we have the regular expression A or B. And we have what is the language of A or B? Right? So the question is do you just break this down and just do it because you're super smart? No. You want to break this down according to these rules, right? This is why we have these rules because so we don't have to be super smart. We can apply these rules, right? Just like a computer. We have an algorithm to follow. We can just do it, right? So by this rule, what does this mean that this is equal to? The language of A. The language of what? A. The language of A union with the language of B, which is equal to what? So now it's the language of A, so which rule does it match? This one, right? Yeah. So it's just secontaining what? A. What type is that inside there? The string A, exactly. So this is the secontaining A union what? Secontaining B. Secontaining B, sorry, my braces aren't really at the practice. The summer got me rusty. Which is what? A comma B. So what strings match this regular expression A of or B? Does this string match? No, why not? Because it's not in this set. Does A in there? Yes. Yes, that matches. What about epsilon, an empty string? Something like this. Is epsilon in here? Yes. It is? No. How many elements are in here? Two. What are they? An A. Is epsilon in there? No. No, so it doesn't matter. Wait, did I go backwards? Yes. Three is not in there. Three is also not in there. It's also not in our alphabet, so cool. So this is why, doesn't matter how complicated it looks, you can just break it down by applying these rules. And at the end, because this is a recursive definition, at the end you're going to get to either L, A, or the language described by the anti-set and the language described by epsilon. Do you want to see the actual rule? I'm trying to think what's best for you. Like written out on the thing. I just don't want to feed you the rules because nobody remembers that. It builds you up to think about the intuition behind what do these things mean so that you can derive what you need to. Okay. So now we have, this is the next syntax, R1 dot R2. So, also not AND. I know we were going in that direction. Actually, I don't know if there's an AND. Yeah, sorry, I got lost. Yeah, so it's concatenation, so this is what this means. So this is the set of all strings where R1 is followed by R2, right? Okay, so that's cheating. Okay. So this is going to be the language defined by R1 followed by, or concatenated with the language followed by R2. So did we just write a recursive definition here because we used this dot twice? No, why not? What is the second dot of line? Sets of strings. Yeah. It's not commutative though, is it? R1 dot R, LR1 dot LR2, it's not equal to S. Okay, but we need to define this first, but yes. Okay. So this means let's build up some intuition first. So let's say the language defined by A dot B is equal to the language defined by A concatenated with the languages defined by B. So the language defined by A is that A concatenated with the language defined by B is that containing B? So then what's the result of this intuitively? The set containing A, B. Right? So it's going to be all the strings in here followed by all the strings in here. Every possible combination as we'll get to in a second. Right? So in A dot B is there any A, B in there? Yes. Is the string A in there? No. So you can think about this intuitively as when you're writing a regular expression all strings where B follows A as this regular expression follows this regular expression. Right? So now let's do something slightly more complicated. What if I have the language defined by I have to cheat a little bit because we haven't gotten it yet, A or B concatenated with C. Language defined by what? A or A bar B. A bar B dot language defined by C. Which is this one? It's A, C. Language defined by A what would be? Union language defined by B. Perfect. Concatenated with what's language of C? That's containing C. We do this one more time in here. We get the second containing A union the second containing B all that concatenated with the second containing C. No. I think so. And then we have what in here? Second containing A, B. Concatenated with the second containing C. So then what's the result here? A, B, A, C. What's wrong? A, B is Y. Yeah, you're not concatenating. Exactly. Should we use A? Should we use D instead of B? Can you have A by itself and B by itself doing that stuff? Can you? No. Let's go back to the meaning here, right? What does this mean? C has to follow up A or B. A or B, exactly. So you can't have A or B on its own. Now, if I extended this with the language defined by A or B concatenated with C or Epsilon what would this mean? E, C, A or B concatenated with C or nothing. And so what's A concatenated with? Empty string. A, right? We went over that in Monday? No, we actually never... We're talking about concatenation with Epsilon. It's on a slide, I promise. That I do remember. So if you concatenate with Epsilon does Epsilon exist in the set that you end up with? No, you get rid of it. Because we have the rule that A concatenated with Epsilon is the same thing as A. Is the same thing as Epsilon concatenated with A or is it the same thing? Yeah, we did do that last time. Cool. Let's practice if you want to. This would be what? What are the strings in here? A, B. A, C. B, C. B, C. B, C. A, B. C? So sharp. This is a set, so order doesn't matter correct? Correct. It's a set. Order does not matter. Order matters inside the strings. Correct. Great. Okay, where do we go? Should have used D, by the way. I think I shouldn't have that. Okay, so we've actually gone over this. So I'm going to briefly go through this. You can have as many ors as you want. So is this a valid regular expression? A or B or C? Yeah. Cool. A or Epsilon? Is that this going to be A? What? It's going to be Epsilon A. We can break it down. We're going to have to think about it. Is this the beauty? The language defined by A, you can even with what? The language defined by Epsilon. The language of A is second-hand A, and the language of Epsilon is the second-hand Epsilon by our rule. And what's this? A. A. Epsilon. So why didn't the Epsilon disappear in the last case? Concatenation. Concatenation, exactly. That's the difference. So far, as long as it's Epsilon or Epsilon, it's just Epsilon. Sure. Yeah. I think the short answer is it never comes up. But you would treat it just like an empty set. So you would treat it like for some reason we're talking about the language defined by A or the empty set. So it would be just A then, right? Yeah. So you have the language defined by A union with the language defined by the empty set. And we know that this is the second-hand A, and we know that this is the empty set. This is going to be the empty set. So the union isn't just going to be A, right? It doesn't mean anything. Let's see, concatenation? I think, yes, it will cancel out. It's interesting. I think you're right. Because you can't have nothing follow you. Is that even defined? Yeah, we have to have a special rule for that. Because... So we can look at it like this. I think it it'll follow from our rules. So we have A, dot, right? And so this... This is... I think it was the empty set. Because I think it's every string in here concatenated with every string in here. There's no strings in here. Therefore there would be no strings if we're resulting. That actually doesn't make sense. Questions? Yeah, it doesn't come up because you just created a trivial language. Yeah, but having the epsilon you never want to match with epsilon. Sorry, you want to match with epsilon frequently but you don't ever need to match with the empty set really. So going forward it doesn't ever come up. That's why I don't think I've ever had that question. I don't know. It's a test question I haven't gone over it right now. Or maybe it will. So we're going to talk about this. The second thing epsilon is not the same thing as the empty set. Right? Wait, does that question mark people too like you're asking the question? Is it? Okay. Okay, so this is the formal definition of... So we talked about... We actually already defined this definition of what does the dot operator mean in terms of sets? I'm going to get it here but it's very similar to the intuition we just built up, right? Every string in the language defined by R1 is concatenated with every string in the language defined by R2. So hopefully that'll be exactly what's here. a.b is x concatenated with y such that for all x in a and for all y in b. We're creating a set. So examples. We're going to have this. So this is going to be... If it's a.b and we have a, a and b and a, b, what are all the strings that are going to be in here? So a, a, a, 3 a's a, a, b b, a, b, b perfect. Is a, b and a dot b? Right? There's no string in here. Okay. So we're going to have all the strings that are in here. Why? Yeah, we have more, right? We have more characters at the beginning to concatenate with. So all those strings have to be in there plus epsilon concatenated with a which is a and epsilon concatenated with b which is b. Perfect. So it's going to be a, a, a, b, a, b, a, b, a, b. Cool. Questions on this? So 4 a's is an acronym. Oh it is. Does it count on a scrabble? I don't think acronyms count on a scrabble. I don't think so either. Okay, so let's go back to this example. Why did I draw these parentheses? What does that mean? I'm sorry, it's just like I hear there's just a sound over here and it's right. I just want to see who's talking. Just like the way you talk. Like in what? What do we use, what's our classic operative precedence? Please excuse my hearing. So what does this mean? What does that mean? Like what is it, what is that? Seven. Seven? Why? How come it's not six? Well the location, what? Yeah it comes first, right? It has higher precedence. It would be nine. Why would it be six or nine? Good. It's testing it. Cool. So the question is do we do this first two times three and then add it to one? Or do we do one plus two times three, right? It may have been a long time before you ever thought about this because you've been using math so long and these precedence rules are ingrained in your brain. In fact, well in numbers we usually write a combination symbol but if I was writing this with A, B, and C I'd probably write it something like this, right? Because by convention I've gotten rid of that symbol and made things a little bit easier. So similarly, if we have A or B concatenated with C what does this mean? What are the possible combinations? We can have the parentheses here or we can have A or B or C. So are these actually different? Does this matter? We should do just like a simple chat. Well that's A, B, C A, B, C B, C Yeah. Every degree? And this one is what? A, C, B, C A, C, B, C Yeah. So it makes a difference, right? And semantically it makes a difference. What are we talking about? When we see this, we see A or B.C, right? Are we talking about hey, I want to find a language where every string is either an A or a B followed by a C which would be this string or is it an... I want all strings that are either A or B.C, right? So we're actually going to see a little bit how this shakes out how we can actually enforce operator preference when we get to parsing and semantic, well later on in syntax analysis and so that would be really cool, we can't really do that now so we just have to by fiat say this is how we're doing it. So we're going to say that the dot operator is very similar to multiplication so it's going to have the same precedence there. So all the the dot operators will occur first, let's see. Yeah, so we did this, did this, right? So the dot has higher precedence than the bar. Okay, so we didn't define it, so everybody cool with that? So going back we didn't, we actually used this, but we the focus are, I guess. In, on like a whole month or some other problem, will we be given a situation that has to perform a regular expression just from like a word problem? Yup. Yeah, to me that shows you that you know you know regular expressions. So there's deciding what is a regular expression, deciding what it means. So if I give you a regular expression and I give you a string, and I say is this string in the language defined by this regular expression? You should be able to answer yes or no. And you should be able to create a regular expression off of me saying I want a regular expression that captures I don't know, all octal numbers. Octal base 8. Okay, sorry. Using the preference of the 0 and regular expressions. We use this, we use parentheses, but we never define what they actually mean. Right, so what's the language defined by parentheses R? It's R, it's the language defined by R. Exactly, it's the language defined by R. So we're really using parentheses just like in math as terms of grouping things, so it's exactly how you can think about it. And that's why we can write this, we can write the language defined by parentheses A or B, parentheses dot C so we know that this is a regular expression so we know that the is a concatenation, well. So this is the language defined by A or B concatenating the language defined by C so we have A, the second term A and B concatenated with the second term C which is AC and BC which I think we do a lot. Okay, now we get to the last rule 7 right, we actually just covered all 6 rules. Pretty good progress for only 45 minutes, right? Okay, so I'm going to give you the intuition of what the star is first. So I want the star, so everything that we've done so far, only match let's say, well without considering this only a finite number of characters right? Well 1, 2 and 3, well except for 2 and 3 all match the second term one character right, either epsilon or the character A and with concatenation we can build up characters, right? And with or we can union two sets but using those can we ever make an infinitely long regular expression? Can we match all strings that begin with A? Because our regular expression must be finite, right? You can't have an infinitely long regular expression right? So what we've seen so far we can add a bunch of dots, we can say A dot B or A dot C or A dot whatever B or A dot B dot C but we can't define all strings that start with an A greeting in one sense I don't know is probably the truthful answer I just kind of am trying to get us to think about the limitations of what we currently have and then the thing about what this star gives us in terms of the languages that we can express So we're going to use the star operator we're going to say that this means zero or more repetitions So for instance, the language described by A star is going to be the set containing what would zero be epsilon, right? And what else? What would be one repetition of A? A, there won't be two repetitions A, A, A, A, A, A, A, A And three? Am I ever going to finish? No So is this a finite set? No infinite though, right? So how would you describe this set? Tell me in English what this set is. No, English and English, not math. It is an infinite set, but what does it mean with the elements in here? How would you describe this set? It is apathetic with a, that's forever. All strings that have a's in them, only a's in them. And what else? And nothing, an empty string. Exactly. What if I wanted to say all strings that have a's in them and not the empty string? So then you can do set minus. Can't do minus. And we don't have to operate. A star what? Actually, a couple things. A star would be what? This would be the language defined by a concatenated with the language defined by a star, which is the secantating a concatenated with the secantating a, we'll just say this guy, which is concatenating every string in here with a. So concatenating a with every string in here, we have what? a, a, a, a, a, a, a, a, a. Right? So now is this every string, every string that's only a's? Is there another way to write this? Ooh, a star a. A star a? Is that the same thing? Yes. Yeah. Right? That string a is going to be in here. When this is epsilon, we concatenate that. It's going to be a. What else? Is there another way? We want this set. What are the other ways of writing this set? So this is going to be what? The language defined by a union with the language defined by a star, which is the secantating a. Oh, no this doesn't work. Yeah. This is the very first line about it, so it's good that you guys went first. Bar epsilon? Yeah. No, because this will mean that it. Let's just a star. I think you need to So these are the only characters we have, right? But we actually just saw, so there are other characters in modern regular expression languages. There's a plus operator, which operates like the star but is called one or more. And it's exactly the same thing as, well I wouldn't do this one, I wouldn't do this one. It's exactly the same thing as this. So it's actually just short name. So every regular expression you can write not doing backtracking or anything like that, none of the extended regular expressions, but any regular expression you can write in any programming language can be expressed using these operators that we've defined here. And so I'll leave this, we can formally define the star operator here using sets, but I believe you actually have most of the things you need to do the homework right now. If not, we'll cover everything else in a bit.