 Good morning everyone, Friday, first week in class, you all survived, so that's good, right? I think half of us did. I think half of us did. As I say, survived is a strong one. There's a lot of people in spots, I'm actually very impressed. Good job, hats on the back to everyone in this room right now. So if you didn't notice, I updated the syllabus with the dates on here. So the dates of midterm exams will be fixed. They're not going to change. The dates of the homework and the projects are probably not going to change either. I don't like to extend any project deadlines, so that's probably not going to happen. So the first project one is up. And so if you go to the tops here, there's projects. Projects take you here with project one. So it's due next Friday at midnight, before midnight. So the goal of this project isn't supposed to be a very complicated programming project. Those are coming later. This project is to get you set up so that your whole environment is ready to go for the actual real difficult programming projects. Because if you're doing these difficult assignments, and then it turns out that you're not using the right version of the operating system that we're using, and you're using a different version of GCC with different compiler options, and now it turns out that your code doesn't compile on our submission system. That's on you, not on us. So this assignment is all about you getting your environment set up. Except for the first part. The first part is really easy. The easiest five points you'll probably earn in this class. Just sign up for the mailing list. Hey, easy. Those of you that have done it, you've already been marked, so don't worry about it. So yeah, sign up for the mailing list. Easy five points. I misspelled. Oh, is that where we are? Yeah, you're right. So your ID, whatever. If you sign up for the ASU email, it's easy to, yeah. If we signed up, I didn't include my ASU ID. Am I still marked? If we can tell from your name or your email, if it's your ASU email, that's totally fine. It's not a big deal. We can always go back and look at when you join, right? So that ever becomes an issue, you can say, yeah, I did, and that's totally fine. Okay. So I also misspelled environment. Okay, so part two. So the main part of this project is to get your environment configured. So we've, and this is really for your benefit. So this is why it's actually for your benefit doubly. So you get some little easy points here and you get your environment all set up. So if we look at this fancy document that the TA from the other section, Mosin created, he's got an amazing document of how to install and configure your environment. So you have to use this exact version of CentOS. This is the same version that's running on general. So you can use general and you can use the lab machines too. It's the same version that's running on the lab machines. It's the same version that's running on our server. So by doing this assignment and following these instructions, you'll have the exact same environment. So we'll have no problems going forward. So CentOS is based on Red Hat for those that don't know. It's the open source version of, or not, Red Hat is open source, but it's the community supported version of Red Hat. So yeah, it's fine. I don't know. I use Ubuntu. It doesn't really matter. They're all very similar. But yeah, all these instructions are very good. If you have any problems or see any issues, email me. You can email Mohsen directly, his email address is here. And he'll update the instructions and we'll send it out and let everybody know. So this isn't meant to be difficult. It's just meant to set up. Everybody's got to do these steps so they might as well give you some credit for it and make sure that you actually do it. So it's all about that. So this first section is all about making sure that your VM is set up. I don't care if you use, I think he has the instructions in here for virtual box because I know that's free on all the various operating systems. You can use VMware, right? But this won't help you as much. So whatever. Use that. Use your Google powers. I have the utmost faith in you guys. Yeah, then the second part here describes different development tools, types of things that you'll need to do, how to compile programs, all that kind of stuff. He has really good pointers in here. So every assignment that we do in this class is going to run on this operating system. So it's not wasted time. So get it all set up and you'll be ready to go. Questions on that part? So GCC is GCC. Does it already have an arch box set up at home? No. So it shouldn't have any problem with downgrade GCC? That is tricky. Maybe. I don't know. So that's one of those things. I can say yes. I can say no. If I say, the truth is you could have problems, right? Yeah. I don't know how well your system's going to downgrade GCC to the exact version that's used here. Because on over 310 last semester, we hadn't run everything in general, but we'd run it there and be fine. These are more difficult problems occur. Especially if you want to use some, so the exact version of GCC is using like a pre-C++11 version like that. So you have to really make sure about that if you're trying to use some advanced cutting edge features that aren't in this operating system. So this is the fair warning that this is the operating system, the version of GCC. That's it. And you can install 32-bit or 64-bit. We have instructions for both. Okay. So after you're done with that, you've set up your environment. You're going to go register for the core submission website here. There's a link here. So I've sent Mohsen all of your ASU IDs, so he's going to send you a verification code. It will happen later this afternoon, so you'll get a verification code. When you get a verification code, you'll be able to sign in. And this is the website where you'll do all of your project submissions. And it's got automated grading, so you'll find out exactly after you submit if you compile, how many test cases you passed, all that nice stuff. If you have any problems, email Mohsen. He's really nice. He was my TA last semester, so I know him. He's pretty awesome. If we have problems, just let me or him know. It's probably nice to tell him that you're in my section, too, because he's aware. Okay. So, pretty easy. So once you sign in, you'll be files for you to download a secret.h and a secret either 32-bit.o, an object file. And then you write this little secret function with your ASU ID. This says ASU. That's good. With your ASU ID, you follow these instructions. You save it to a C file. You compile it. It's going to give you some output. And then you're going to upload that output. So it's going to look, if you do it right, it's going to look something. When you run it, it'll look something like this. And this is the file that you'll, so you'll run it. You'll pipe the output, and then you'll submit that file to us. And then we'll automatically verify that everything's all good. And this way we can verify that you're actually running the operating system that you should be running, and that everything is good. Questions? So the goal is to not, I don't know, the mean is right for it. But we are very strict about the operating system, because it makes sense that we all have to use the same thing, right? And I don't know, we could argue about Ubuntu or Arch, or whatever, Debian, yeah, anything, right? FreeBSD. The point is that like, you know, when you go work at a company, you're going to be in somebody of an environment that you probably didn't create. So they say, hey, we use whatever Ubuntu LTS version for all of our things, 14.04, that's what our whole entire infrastructure is. So you have to make sure you know what that means, and that you can set up your environment that matches them, and all that kind of stuff. Because oftentimes you don't really have total control over that, over where your code is run. And it's important for you to be able to set up your environment, so it's the same. So yeah, I've got a VM here, this is my set in OSBM that I use for testing out stuff in this class. All right, so now I kind of wanted to demo a little bit. I didn't get time to on Wednesday, but I wanted to show a little bit of what some of the editor stuff you could do with Emacs. So now if like, so this is a demo that I actually, I'm gonna demo really, let's see, what's my name? 16. All right, so this is a real life example. So this is, when I was at Santa Barbara, we had a system where people could try to automatically create a hacking competition, and it would generate virtual machines for them, and then they'd have to download it. Anyways, I needed data about how many people actually used this thing, and of course nobody there actually knew where that data was, where it was located, so I had to poke around the server, finally found a SQL database where I dumped all of the people who had the status of ready, and then I wanted to, I just wanted to count how many of the teams there were, all that kind of stuff, but this happens a lot where you have some data in some weird random format, and now I want to transform it in some way. I'll use this French buffer. So this is what's kind of cool about Emacs is I can quickly, easily write. So my goal is I want to get all of these email addresses and put them all, just one on a line, so I just want to extract all these email addresses. So let's see. So then I just wrote a super quick, off-the-cuff macro. I would say I messed up on that one, but I just wrote a really simple, quick macro to basically go to the, I think it's the third pipe symbol, and then take basically that word from there and go to the first space and then copy that and paste it into the other buffer, then go back and go to the next line. And then I can just run it over and over and over, and then I have pretty much all the email addresses here that I wanted. I don't know if you do all kinds of other manipulation stuff, like let's say I wanted to put this in a list, I could also use a macro for that. Yeah, you could put it in a thing, you could put it in an array, like oh, but I don't have, like I want to put this in a Python program, so let's see if I can do something like, live demos are always awesome. So like this, so now I have an array, I can just drop into a Python program to iterate over this. I could even go back and grab each of these email addresses, I could grab the name and make a key dictionary value pair, but whatever I think the point is. So the point is that I'm pretty productive, pretty fast when I'm in this. So yeah, I don't let the editor get in my way. So that was the little demo. Now let's go to the slides. Any questions on the subject we've covered so far? Now we're going to get into material. Yeah, reading, that's good. So at the end of last class, right, we talked about, okay, so our goal here, we want to understand programs, but we know that programs are really just sequences of bytes, and that somehow the compiler or the interpreter has to extract some meaningful information from those bytes. So lexical analysis is the first step. So it basically takes those raw bytes, sequences of bytes, and sort of abstracts it. So it's going to be essentially in charge of doing the syntax. So why does the language need a syntax? Because it needs to have structure, it needs to, you know, be readable and make sense. It needs that structure, it needs to be readable, it needs to make sense. Do you remember studying the syntax of, what's the first language you probably learned, Java? Right, but they tell you, okay, you have to have the class, and then you have to have the braces, and then you have to have the other brace, and then you always have to have a function, and it's always got to, you know, at the end of every line, you have to have a semicolon, right, those are all syntactical things that you have to remember. They don't, at this stage, right, we're not talking about meaning, about what they mean, we're just talking about the structure. How does this actually look? And so we really need a syntax for basically two purposes, right, so we want, so the compiler needs to be able to take random bytes and make sure that they're correct in some sense, so make sure that they're correct syntax about language. But they're also very important, and clearly specified here is the key, because how are you going to write a language, or how are you going to write a program in a language that doesn't have a clearly defined semantics? You don't, you never would, right, you could just, what are you going to guess and check, or use the, I don't know, use an example and slightly change it, right, I mean that would be kind of crazy to try to do that. And so that, yeah, so that, so it really serves a dual purpose, so for you, the programmer, and for you, the compiler writer, so that you can actually understand it and enforce the syntax. So input is a series of bytes, and we're trying to get from these strings of characters, these just strings of bytes to some kind of program execution. So we're trying to assemble this string of characters and give it some more meaning. And what we're going to output here specifically is a series of tokens. We'll talk about it, we'll look at examples of tokens in a second, but our goal is to separate those series of bytes. Maybe for an incredibly simple thing, we're just separating on spaces, right, or maybe because we want to know if something is an ID, or if it's a function call, or if it's a class name, or if it's a field name, all these types of things we want to know and from these tokens. So we'll see that in a second. So here, we talk about language, I hope it's probably pretty clear, I'm not a linguist. Are any of you linguists in here? That would be kind of cool, nobody? Sorry, don't want to hear. Okay, so in English, what's the syntax of English? How do we make sense of these random gibberish lines on our papers that you're writing down? I guess I should say in my notebook, I'll talk only about my gibberish. Yeah. It's been a couple of years since high school in English, but remember, every sentence has a verb, a noun, and a predicate or something, and it ends with a period. Correct, okay. Or some punctuation. Stounds, predicates, right? Those are all meaning things, in some sense, right? I wouldn't say those are necessarily syntactical things. Punctuation. Punctuation, yeah. So between every sentence, you have to end with a period. Grammar in general, order of words, the way they fit together, but you can't just have random words in a sentence they have to have, meaning, just like in a programming language, you can't just have random characters in a particular order. Correct, so there definitely is some order. I think the other stuff is a little bit more in the semantic, like what are the words, what are the... Do you think it's like capital letters and punctuation? Yeah, exactly, exactly. So how do you know like when you just look at if I were to like randomly type, how would you be able to tell like it's English or not? Spaces between words. Yeah, spaces between words. Do you speak German here? It's a good high school. I don't speak German, but I know a lot of German speakers and they say it's crazy because you can just keep making up words because you can make a word longer and longer and longer by just adding, is it adjectives or whatever? It's just random, really. It's random, really. So yeah, different language, right? Fully different language, different things. In English, our base unit is pretty much the word if you think about it, right? So we have words, and the words separated by spaces, and so we have a paragraph where each sentence is separated by a period, and each sentence is just composed of some words where those words are the English alphabet, A through Z, lowercase, A through Z, uppercase, probably numerics, zero through nine as well. So at the bottom we have an alphabet, right? So we have to have some kind of symbols. Do these symbols mean anything? Yeah, tons of things. Tons of things? Why? Some of them do by themselves. A through Z don't really necessarily mean anything by themselves, but the comma, period, exclamation is my question. Why do they, why those? Because they look cool. Here we chose them. Somebody chose them, right? I didn't choose them. Did any of you? Yeah, they're just symbols, right? They're just singles, we can call them they're characters. You can think of them as bytes in the programming language, symbols that we say A through Z are letters and they mean something. The comma means something, the period means something, the exclamation point means something, these are all special in some sense. If I just were to draw some random thing or take in the, I don't know, some crazy letter that's not in our languages, it's in other languages. So it's expanded, it has the N with the accent over it, right? The N yet. So that would be not in our alphabet, right? We don't even, we see that character, we don't even know how to input that, it's not even in our alphabet language, right? That would be like if you were to randomly paste in Unicode characters and emojis into your program and then like the program's like, I don't know what this is, these are just bytes, I've never seen these bytes, these aren't in my alphabet, like this living valid program. So yeah, so above the letters, right, in our alphabet, we have letters together to make words. And we know more or less, now here's where we have to get a little bit wishy-washy because we're talking about a language that's spoken and written and constantly evolving and all that stuff, right? And it's not meant to be parsed and interpreted by a computer, if it was an after-language processing would be a lot easier. So the words are defined somewhere, right, they're defined in a dictionary and this is what says what are valid words and we can categorize them into like nouns, verbs, as we said, articles. My English is also spent a long time. We do that compose the words into sentences and we form sentences into paragraphs. So in a programming language do we have similar analogous concepts? There's a beginning and an end. A beginning and an end, yeah. Do we have the equivalent of an alphabet? Yeah. I was going to say you have lines. Lines? Yeah, we have lines depending on the program. Maybe, depends on the language, right? I mean, in fact, huh? I know, but do you want to answer anyways? What was the question? Do we have, what analogs from English do we have in programming languages? So like we have this, what are things that we have in a programming language? Yeah, so we have like keywords to help define variables, right? So we have special words do we have an alphabet? Yeah. Yeah, right? So there's characters probably on your keyboard that you can't just put into your program anywhere you want, right? And you may not even be able to put them in at all. Yeah, we have a loose something similar to words, right? We have, you know, they may be white space, they may actually be not. There's some programming languages that don't white space doesn't matter in certain places. So yeah, so in a programming languages, so these are all about just what symbols are important to the language. And remember, the alphabet is completely arbitrary, right? Just defined by the programmer, or the language designer, who says, okay these are the symbols that are important in my programming language. So what are some symbols let's take, let's take seed. So what are some symbols that are important in seed? What was that? Semicolon, definitely. Slash, brackets. Asterisk. Parenthesis. Parenthesis, yeah. Ampersand. Arrow. Oh, that's one character, is it? Two. Two characters, yeah. So now we have two characters that mean one thing, semantically, right? Actually, it's kind of funny, other I think it's in Scala, you can actually use Unicode arrow instead of doing the dash and the right braces. It's kind of weird but it actually looks really pretty. So it's a pain to actually program it. What else? Is that it? Equal sign. Equal sign? Plus, minus. Plus, minus, what was that? Quotations. Quotations, what kind? Single and double. Single and double. Both. Is that it? Is that all you guys use to program in seed? It's just a bunch of the sides, you know, the seeds. Nobody said that, right? Hey, they don't exist until we say something, right? So yeah, ASCII, so the letter. A through Z, Z through lowercase, uppercase A through uppercase Z, zero through nine. Let's see what else that happened here. Accumulation point, question mark. Numbers. Numbers. Oh, yeah, they're not accurate. Numbers are important. What is the question mark used for? DI. What's that called? Conditional operator. What was that? Yeah, the ternary operator, right? So basically like an implicit if this and this operator. Yeah, I think these are most of them. There's probably more. I think there's more that you guys talk about. D-reference, ampersand. So just like in English, right, we're going to want to create some abstractions of these low-level out-of-events, right? So we don't really want to operate on that bracket. 10, 1, 0, 1, 0, right bracket. These are just characters in the alphabet, symbols in the alphabet. We want to operate on something a little bit high level, like tokens. For instance, we want to know when we say two equal signs, syntactically that means a separate thing than just one equal sign, space one equal sign. The equal sign is the assignment operator, right? What's the double equal sign? Comparison or quality operator, yeah. Exactly. Also, less than or equal to, right? It's not a less than symbol and then an assigned symbol, right? These are actually abstractly think about these two symbols are one token, which means less, the token is less than or greater, less than or equal to. Also, certain combinations of symbols can be important tokens in our language. Usually keywords are done this way. For instance, the keyword while, right, is actually a token called while in the language and not a name of a function while or the name of a variable while. Have you ever tried to write a variable in Sierra Java that's called like class or while or if? Yeah, you can do it. It says it's reserved, right? And this is why. It's reserved because it's a special token in the syntax of the language that says anytime I see these five characters, that's the special token while. Also if. And so what we want to do, we need a way to precisely define what are tokens and what are not tokens, right? This is pretty easy. We're going to say these things are tokens. But how do we define that? What are the rules for C on variable names? Can you just type anything as a variable name? They can't start with numbers. Can you have a question mark in your variable name? No. Would you want to? No. Maybe. Actually some languages allow that and it's really nice because for Boolean values it says like is true question mark and so your if statement says like if user validated question mark so it's like clear that it's a Boolean value. I think so. Some languages do that. Yeah, so you can't even start with numbers, right? Which could be very restricting. Sometimes you want to say it's two-something or I don't know, there's various cases. But these are just restrictions that are placed because the programming language has to have a way to understand when you see something is it an identifier which would be a variable name or a function name or is it something else or is it a constant? So what about when you specify constant integers, right? How do you do that? Not constant, sorry, I'll just say literal. What do you mean? I want to set x is equal to 10. How do I do that? How do I write 10? One zero. One zero. The symbol's one zero. What if I want to use x? Zero x. Zero x a? Yeah, exactly. But that's because the syntax in the language says you can write an integer literal as just numbers and then that will be, well not actually, you can write it in numbers not starting with zero and it will be base 10 number. If you have a zero x followed by any number of one through nine, zero through nine then that will be in hex. What about if you start with a zero? Optal, so base eight. If you start with zero then anything after it is base eight. Really? Yes. I think there is a way to do binary. I don't know, I think you practice it with a b, lowercase b or something or maybe that's just Python, I don't remember. I think they use, I think people use hex a lot of times where they would do binary. It's my feeling. Does that make sense? Yeah, because it's, who wants to deal with all the ones and zeros and you have to sort of like all line properly. So this is pretty much the heart of lexical analysis is trying to answer this question. How can we precisely define these tokens and say hey, from this alphabet when you see a sequence of characters like this, it means it's this token and when you see, so when you see a sequence of characters that says that starts with one through nine and is any number of zero through nine followed by a space then that thing is a number. It's a base 10 number. So that's a token that you want, right? You don't really, so this is what we're going to dive into and we're going to get into this stuff. This is actually something that's incredibly relevant to almost any programming language that you use and not just from a compiler standpoint. So we're going to see how we specify that and the tool that we use and the language that we use to describe tokens is something that happens and occurs in a way that's going on for all the time. Okay, so we want to, so we're going to call a string, so here we're not talking about a string literal in your programming language that's separated by double quotes. Here we're specifically talking about okay, a string is just a sequence of alphabet symbols. So the input into our lexical analysis is going to be some sort of string and our goal is to output some kind of tokens. So now we're going to slowly introduce a little bit of math symbols just so we can talk about things. So we're going to use the capital sigma for alphabet so that's going to refer to so that's the set of symbols in our alphabet. Right, so just a set it's a sigma an infinite set or a finite set. A finite? Why? There are a set number of symbols that you can find. Yeah, it kind of doesn't make sense to talk about a language where you can just an infinite number of symbols that you can possibly use to write the language. Yeah, so it only really makes sense which, just like English, there's a fixed number of letters. I mean, I guess technically maybe you could consider emojis as adding to these symbols or something weird like that. But, so we're going to say that a string is just a finite sequence of symbols and this is an important thing so it's going to count a lot in the class. So we're going to represent the empty string as epsilon. So epsilon is a string of sequence 0 of length 0 so it has no characters in it. So what does that mean? So what about what happens if we take epsilon and concatenate it with another string? You get the other string. You get the other string, right? There's nothing in epsilon. There's no sequence of characters. If we take two strings and concatenate them together, right? The result's pretty easy, just the concatenation of all those, the sequences are concatenated together. So yeah, so if we concatenate epsilon with some string s then it gives s. So if we, and it doesn't matter if we do that before s or after s, right? So epsilon is it's not nothing, but it's an empty string. Okay, so in all these examples I'm going to try to do this. I don't know if I'm going to be successful, but you know, I'm going to try it. So I'm going to stylize strings differently than what we'll see as the definitions of tokens. So a string is an actual input string. So it'll be either italic and dark blue or in between all quotes. That's what I'm going to try to do. Okay. So now we've defined strings. So we've defined okay, this is the input to our language. Now, what are we trying to get out of it? We want to get something out of it. So we're going to say that sigma, so we talk about this, sigma represents the set of all symbols in an alphabet. So, just like you said, this is a set. It has all the symbols in our alphabet. Pretty simple. It's finite. You can enumerate it. It's easy. Okay. So we're going to find sigma star. It's going to be a set of all strings over sigma. So what does this mean? It's everything you can make out of those symbols. Exactly. Everything you can possibly make, every string you can possibly make out of all the symbols in sigma. So it's going to contain, so for strings of like one, how many strings of like one is it going to contain? Whatever the number of signals. Yeah, the number of signals in sigma, right? The number of symbols in sigma, right? So the strings of like one in sigma star are going to be so this was English with just a let's lowercase, no numbers to be a, b, c, b, e, f, g, h, j, k, l, m, o, b, q. Oh no, I got lost. So all the way through z, I'll say dot, dot, dot, z, right? So that would be like one and then what would be some strings in that set of like two? Or sizes of sigma times, sizes of sigma. Yeah, so what would be some examples for that? A, A would be in there, what else? A, A, B, A, C, A, B, A, D, all the way through A, Z, and then B, A, B, C, B, B, all the way through B, Z, and then all the way down to Z, Z. And then in the set you also have all the strings of like three. A, A, A, A, B, A, A, C, all the way through all the way for every possible string of like 0 to as many as you want to go. Every string is going to be in there. So is it a finite set or an infinite set? Infinite. Yeah, so this you can think of as you've heard that, you know, random, a room full of infinite number of monkeys typing on keyboards are going to like eventually produce the work of Shakespeare. So you can think about the keyboard is sigma, the buttons on the keyboard and sigma star is that infinite room of monkeys just pounding away on their keyboards and everything that they output is going to be in sigma star. Now monkeys don't help remember this. It's kind of a cool visual. Monkeys. It's like what are they doing when they're not catching a keyboard? Who's doing this to them? I feel bad. Okay, so then so we have this, so now we're going to define a language. We're going to define a language. So a language over an alphabet is a set of strings over sigma. So like where a sigma star was all possible strings. A language is going to be some subset of them, which makes sense. So if I say, okay, English is composed of the letters A through Z and I say, okay the set sigma star is every possible combination of those characters that you see in every possible length. Well then English must be all of the English words must be in that subset or in that set, right? Because if you're randomly creating not even randomly, if you're creating every possible combination then that language has got to be in there. Okay, so we're talking about this. Is sigma infinite? No, right, it's finite. I guess, maybe it's possible but in all the languages we're going to look at it's going to be finite. Is sigma star infinite? Yes, we already talked about that. What about L? Is it a language? It could be, they keep adding more and more. Yeah, I would I don't know. Hmm? That was great. Yes, a question. Okay, is the question is language finite or is the number of languages finite? This is specifically talking about L, so the way we kind of define things here, right? We can see that, well, sigma's not really infinite. I guess it's up to us but often times it's not going to be. We can see that because sigma star is a combination of all possible strings of all possible lengths. It's definitely going to be infinite. Even if you only had, in sigma, if you only had two characters, one and zero or even just one character in your language, right? Then sigma star has every possible string like out of infinity. So the question is, if I define a language from an alphabet and it's a subset of sigma star, does that mean that L is infinite? Does that mean that or has to be infinite or can be finite or is finite in all possible languages? What do you think? It doesn't have to be finite. Okay, what would be some cases? Well, all we know is that L is a subset of an infinite set. It might not include everything in the infinite set but it might itself be infinite. Yeah. The example I would give is like if the full language is any string of ones in the list and the subset is just any string of ones that's sort of infinite. So what about other languages that, yeah, in the back there? So what the question really is is like, is there an infinite number of C programs? Because each C program, I mean there is and even if you're limited to print app statements you can still have an infinite number of print app string sets. Right, so let's say you have the language L and you take a string from that set. So what is that string? What does that represent? Let's say it's L represents let's say sigma is the the symbols for C and L we're talking about is a C programming language. So if I take like one of those strings in the set L, what is that? What was it? A method or a function? It's a program. Program, yeah. So this is L defines all valid C programs. So we're defining a language, right? So we're defining the C program language. The only way to do it is to define the set, right? So yeah, so is there an infinite or a finite number of C programs that you can write? And back it up with an argument? No, just want to say it. So why is it infinite? You could write the same program but just add a single character or add a single space or something and it becomes a different permutation and you could do that infinitely many times. Yeah? There's nothing really limiting the length of the program so you could write hello world to infinity times and have infinity programs. Yeah, every time you add another line that says print hello world, right? It's a new program, it's going to be a new string, doing that infinite number of times. What about English? More or less in concept. So let's say the language, yeah, the language is specifically talking about sentences. So their finite number, infinite number, yeah. It's the same argument as programs and books. Yeah, actually there's a really great example so I'm going to put this. There we go. Wikipedia is always great. So it turns out that the sentence buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo. It's in tactically correct. It's grammatically correct sentence and it can be extended infinitely. You can keep adding however many buffaloes you want. Yeah, so it's like the buffalo from the town buffalo and buffalo is also so you can see here they like parse it all out so buffalo is it's a proper noun as the place buffalo, it's a noun apparently it's a verb. What? It means to bully someone. Oh to bully, oh that's sad. Pretty sure they mean physically bully. Right, I guess that makes sense I guess. I can see why that comes. So yeah, so you can, okay yeah in order to baffle, I like that one better. It's like a cunning buffalo. Buffalo. Yeah, so you can extend that out infinitely and you just keep adding buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo, buffalo. Yeah, you just keep doing this forever. So yeah, you can do that too which is fun. The main question is, so we have this set L right, so when you learn English, do we just teach you do we spend your formative childhood years teaching you about this set L and teaching just describing every possible language in English? It would be interesting to see a kid raised like that. I guess it depends on the order that you went through L right. Yeah right, so like the same thing goes for program language, right? You don't learn C or C++ or Java because somebody just feeds you every possible valid C or Java program, right? So the question is, how do we define this set this infinite set of possible strings in our language we want a way to define it such that I understand it and you understand it and we can both say that a string is going to be in that set or is not going to be in that set so who has used regular expressions in the past? So what context? Well, doing things like what you do with Emacs Yeah yeah, regular expressions with Emacs Honestly, I use it more in grep on command line because I know that Emacs has a weird syntax that I don't use but I should get to know Yeah, parsing really long strings Yeah, parsing really long strings Same thing, oh like pipeline Yeah Verifying like email and password formats We're going to get to that, that's a bad idea Yeah, it's not a terrible idea though Just not a good idea URL redirects Oh yeah, so in like an hdaccess file doing the URL redirects Yeah, representing finite sets getting a little close to what we're doing but do you actually like do that on regularly? No Okay, grouping strings in a large set What language is How do you use write your expressions in PHP C sharp Ruby Java Excel Purl JavaScript JavaScript write your expressions Yeah, so write your expressions are used all the time so this is why they're really important and they enable us to define these tokens that we really care about so that we can define this language L So they come up all the time and we're going to look at them and investigate them there So the beauty of regular expressions are they're compact from a notational standpoint so I mean maybe if I tried really hard I could describe to you in English what a C program looks like or what a I can describe to you what numbers what a number a base ten number literal is in C or what a hex literal is in C or I could just show you a regular expression and it's just it's probably even more precise because we both know exactly what it means they're pretty expressive so they're able to describe complicated structures and they're precise and honestly the best thing is they're widely used so you know this isn't just some academic thing even though it does have nice theoretical applications because it does map directly to automata that you could then use to correctly check for the string but practically these are used all over the place and yeah this last part so it's easy to generate so you can generate an efficient program to match a regular expression I think you learned more about this in 2015 okay because we have a bit of a review but it's important okay so now we're going to get into an infant loop and get stuck and be stuck in this classroom forever so we first we want to use we want to talk about regular expressions because we want to use them to define the tokens but we have to first describe the syntax of regular expressions right we're defining regular expressions because we want to describe the syntax of a program language and so we could do that by using regular expressions but we have to define the syntax of regular expressions so we're just going to describe it so we're not going to do it formally like for other languages we're going to do it by showing you regular expressions for this we're just going to build up what regular expressions are so syntactically so what did a regular expression look like it's either an empty set an epsilon what's epsilon what's the difference between empty set and epsilon empty set is nothing epsilon is a string of empty set yeah like no no no it's fine the empty set is just an empty set but epsilon is an empty set of a string it's essentially a string that's empty yes so yeah so empty set is a set with nothing in it right braces nothing inside of it but epsilon is a string so we know it is a string of like zero or it's going to be a where a is some element of the alphabet pretty easy right so this a though this is the thing where the difference between the a string an input string and our regular expression is important right so this a is not an a from a string in that language it's a regular expression with the symbol a okay so we have r1 bar r2 where r1 and r2 are regular expressions r1 dot r2 where r1 and r2 are regular expressions every and then let's see parenthesis r where r is a regular expression and r star where r is a regular expression so we see how this is a recursive definition but at the end when you pull it all apart you're going to be left with so what are the base cases here an all side an all set empty string and a and a or three so it's an element of the alphabet whenever we see something bar something else we know the left side and the right side both have to be regular expressions so they can be any of these seven and then when you pull that apart out of these you can pull it apart and then it has to be something else it has to be a regular expression itself and so at the base it's you're going to be epsilon or a or not a specifically the elements of an alphabet questions on this so with this we'll stop here and we'll start again on Monday no not Monday don't go to class Monday it's a holiday I will be here until I guess um I I I I I I I I I I I I I