 Okay, today we're going to learn all the basics about regular expressions. One of the most important things in computing, not just computer languages or programming or something like that, but it's actually just generally useful to know how regular expressions work. And I know for whatever reason there are a lot of people who, I don't know, don't know them yet. They're very easy. Okay, what is the idea behind a regular expression? Here's the thing. Sometimes you need to either match a, you know, search for a term, match a term in a program or maybe just you're looking for something. And you don't, it's not like you can put in a single word, but let's say you know it starts with, you know, L or you know it ends with NG or you know it has so many characters in it, you know, very specific things. Regular expressions are a language for asking more, I guess, abstract things about words. Very, very useful. Everyone should know them even if you don't need to know programming because they're very useful in all these contexts. Okay, so I have a little file here. I made a little dummy file and we're going to be going through this as an example of, you know, regular expressions. So let's do it. Now the program I'm going to be using is one that everyone has on a Unix-based operating system and that is grep. So what grep does, I'm going to run grep, this file here is called RF, that's its name. And how it works, how grep works is that you give grep any regular expression and it will search a file for that regular expression or a match to that regular expression and it will print back the lines with that match on it. So let's say I give it the word file. Now first off, let's learn our first regular expressions. Any word, any sequence of characters, that is a regular expression. So if I search for file, if I give grep that regular expression, it will find all the instances of file here. Notice it also matches files and stuff like that. We could just put files in here and it will only return, okay, there's our match, files. So grep is just a program that shows you, grep literally stands for go to regular expression and print, that's what grep stands for. So it's simple enough for our purposes here. Okay, so regular expressions, any word is a regular expression. But let's say we have something, or let's say, or not just words, any character. So let's say we wanna look for words that have an X in them, okay? We can literally just say X and it will print out everything that has an X, right? Or at least all Xs, right? So grep is showing us the whole line but the regular expressions are only matching the things that are specifically X. So let's see, look at these words. Let's say for example, we wanna match exactly these words that have F and then something else and then X. We have fox, we have fix, we have facts. How do you match stuff like that? Now, you could of course match any of those words individually by typing them out but there's a nice little magic character, a shortcut character in regular expressions, which is just a period. And if you run, if you search for the sequence F, period X, what period stands for is any character, just any one of a character, okay? So that matches fox and fix and facts and it also matches the fix in crucifixion, so that's a nice thing as well. So that, well actually let's take some notes, okay? So we'll say notes, okay? So I'm gonna keep track of all the magical characters in regular expressions, which we're gonna move this stuff around, okay? So let's say the period is our first magical character. That means any character, I should say any one character, all right? So look at these for example, here are some similar nonsense words. So we have fox, fix and facts, but you have fox and fix and facts, more letters in between. How do you match something like that? Let's say you don't know how many characters are in between F and X. You just wanna match them all. Well, you can do something like this. You can add in this little thing, which is a star, it's called the Cleany Star. And what that does is it says, okay, take the previous character I gave you, which in this case was the dot, which means anything. And what the Cleany Star says is give me that, but give me any number of that, okay? It could be one, it could be zero, it could be a hundred, all right? Now if we run that, now we see we have fox, fix, facts. We also have fox and fix and facts and we even have FX. As I said, the Cleany Star matches any number of characters, even zero. So if you put in this, we can use it right after dot. We can also do something like this. We can replace dot with let's say O and this will match fox. It will match fox and it will match FX, okay? Because this is saying match F and then O with any number of O's, even zero and then X and that's all the stuff that matches here. So I'm gonna put, so our next magical character, this is one of the most important ones, match any number of previous, okay? And I'll say including zero. So what if you don't wanna include zero? What if you say no, no, no, I want fox, I want fox, but I don't want fox, all right? If you want to do that, you replace your star with a plus sign, okay? So the plus sign will match any number of the previous match, but as long as it's greater or one or greater, okay? Greater than, I'll just remove that, okay? So oops, I'm gonna change that to a plus. So the dot is any character, Cleany Star is any number of characters including zero, plus is any number of characters greater than zero, basically. And in this case, notice I also put a little slash in front of the plus sign. I'm running this on the shell, and a lot of shells, there are a couple of characters that you might need to escape with slashes before them. We'll talk about that in a second, or a little bit. But just know if you're running this with a slash and it doesn't work on whatever you're doing regular expressions on, you might wanna get rid of the slash, or if any character match doesn't work later on, you might wanna put a slash in, okay? That's just one thing to be aware of. It depends on like, it's complicated depending on the circumstance. Just know that that's usually what's going on if something's going wrong. Okay, so we talked about matching any character sequence and stuff like that. Let's go a little further. Let's say we wanna match all the lines in this file that end with X. That's a very specific thing. How do we match all the files? So these lines end with X, Fox, Reg X, Lax, Docs X. How do you match all of those lines? Well, you can do something like this, okay? It's actually extremely easy. You just say X and then dollar sign. Now what dollar sign is, it just means end of the line, okay? So if I run that, I'm gonna get all of the lines in this file that end with X because it's looking for the sequence X followed by the end of a line. So a dollar sign, again a special character, which means end of the line, okay? Now, additionally, there's an equivalent to dollar sign. What if you wanna look for something at the beginning of the line, okay? So let's say we wanna look for these lines that, or, you know, these lines that start with SP, split, splinter, spines, sparse, spring, spool. What if we wanna match those? Well, we can actually use the little Karen, Karat, Hatchick thingy. This means the beginning of the line. So if we put that plus whatever sequence we want, we're looking for the beginning of the line directly followed by S and P. So if we run that, we will get just those words that's, or just those lines that start with SP. So I'm gonna put that in here. This is beginning of the line, okay? And these are already some of the most powerful regular expression thingies, thingamabobbers. Okay, so here's a little test, okay? Let's say we have these words, all right? We got define a reg X for these words, boomer, zoomer, a coomer, and the consumer. And let's say that we only, and let's say in these two, we don't want A and D, we just want the word, okay? How do you match something like this? Now first off, you obviously could say oomer, okay? And that, in grep, that will bring back the whole line. But what we really want is we wanna match exactly the words, like we want the words to be read in this context. That means we have a real regular expression that only matches those words and, you know, nothing more, nothing less. So we wanna have a perfect regular expression that matches these words, boomer, zoomer, doomer, coomer, consumer. So what we could do that, well, first off, you could say, all right, well, we could put period in here, and that will at least match, you know, because period means any one character, that'll match boomer, zoomer, doomer, okay? And coomer, that works, it doesn't match consumer. Now you could say, oh, I'll put the cleaning star in here, okay? And yeah, that would match consumer, but it actually matches A and D, too. And we don't really want that. So one character you can use is instead of the dot, you could use slash capital S. Now what that does, and I'll write that over here, that means any non-wide space character, okay? That means anything that is not a space or a tab, it'll match letters, it'll match, you know, punctuation, stuff like that. So in this case, this says any sequence, basically any word, sort of, that has any characters as long as it ends in oomer, okay? That's a regular expression that defines these. It matches consumer, even though it's a little longer, and all the other ones as well. Okay, so that works good. Now you might guess that there's an opposite of capital S. What if you want to match white space? Well, you can actually use lowercase space, that means any white space character, okay? So to be clear, let's say we have something, well, let's do an example. I'm gonna say the, and then I'm gonna have like a whole bunch of spaces, consumer, okay? So let's say we wanna match these two lines, the consumer and the consumer, with, you know, a bunch of space between them. We can do something like this, okay? We can say the, and then I can have slash S for white space, and I could say, all right, there's gonna be, you know, some amount of white space in here. I'm gonna put the plus sign, so we can match any number of that white space. I don't know how many there are, and then we're gonna have consumer, okay? So if that, oh, gotta actually give it the file to run on, but that will match both of these, both of these, okay? And of course, if I can, if I replaced some of these spaces with tabs, that should work just as well. It's treating them as, you know, just white space we can deal with, okay? So that'll work, and on some, now grep in itself matches, it can only match, it can't match stuff over multiple lines. That's just how grep works. But there are some places where I think white space character, this will match like line breaks as well. It won't here, but you know, that's how grep works. So just be aware of that. Okay, so one of the first regular expressions you'll probably in practice end up making is a regular expression for a URL. How do you do something like that? You know, let's say you have a program that's gonna test to see, you know, if a sequence is a URL. So let's try and do that really briefly, given what we know. So we're gonna start here, and first I'll just run HTTP, okay? We'll search for that sequence in the file, and notice all the stuff we get. Now, first off, we want the entire URL to be red, but we also don't want this other junk, you know, all this other stuff around it that I put in here just to confuse us. So we don't wanna include any of that. So let's start thinking about it. Now, our first little hurdle is the fact that some of these URLs I've created with HTTP, and some of them have HTTPS. How do you make it so really we want it to match the S, but we don't want it to throw it out if the S isn't there, okay? So we want this one to work, even though it doesn't have an S. So what you can do here is you can do this. You can say HTTPS, and then I'll put in a question mark, and in the shell I have to escape this with a slash, I think, but if I run that, you'll see that it matches HTTPS, but it also still matches HTTP. What the question mark means is basically, the character right before this was optional, okay? You can include it, you can not include it. That's just, you know, whichever is more convenient. And notice also some other things. In some context, the Cleany star, which can match a character could also mean optional in some contexts because it will match for zero. Now, of course, that would match in this case, if we put the Cleany star here, it would match HTTP SSSS SS, but you know, anyway, that's just the side note. So let's say we want to match HTTPS, optional S, colon slash slash. So then there are a couple of things we could do. If let's say we put dot for any character and then a Cleany star to match anything. Now that wouldn't exactly work because it's gonna match this stuff that is not a URL at the end. So we could at least change this instead of a dot, we'll use the non-white space character sequence. So that will match, this is actually much better. It does not match all this junket in, it stops at the white space basically. So that's a little better, but notice there are a couple things that aren't perfect. For example, this thing here is not really a full URL. It doesn't have a dot in it, right? We really wanna dot in it and then whatever.com or something like that. And this one here as well, well, we'll talk about this in a second. Let's say we wanna put a dot in here. Now here's another important point. I've talked about the slashes to escape, escape something. And this is not just in regular expressions, this is everything. And that is basically the slash in case it isn't clear. It takes one of these magical characters like a dollar sign and it will either make it normal if it's a magical character or make it magical if it's a normal character in some contexts. What was I saying? So we wanna match just for the period, okay, the actual real world period. We don't want it to have this magical interpretation. So I am going to say slash and then period. And if I run that, you'll see, okay, it matches for this period. Okay, it doesn't match for anything else. And remember, if you actually wanna look for a period, since period is usually a magical character, you have to put a slash before it. If you just want it to look, okay, I actually want a period and nothing else, all right. Last, let's match these. So after this, okay, I don't know why I deleted that. After this, we could say, oh yeah, well, so we have the domain name, dot, blah, blah, blah. Well, the blah, blah, blah. That's another non-whitespace stuff. But in this context, even this isn't perfect because we're matching this thingy, which isn't really a URL. So there's one other thing we could do here. Instead of saying, let's say really what we wanna say is dotcom.org, all those top level domains, they really only have letters in them. So one thing we can do if we only want letters is this. We can say, it's gonna look a little funky, I'm gonna put A to Z in square brackets, okay. And what that is going to do, oops, and I should, this should be a plus, okay. So what that does, notice that it gets rid of this thingy, what that does is it says, okay, give me any character from A to Z, all right. It could be anything, I don't care. And I can have multiples of them because I put the plus sign. So, so A to Z is a way of saying really any lowercase character. Okay, what if you wanna make uppercase characters? You can probably guess. You just say capital A to capital Z. So uppercase, and in fact, you can have this, well, I shouldn't say character, really, you know, letter, ASCII letters, I'm sorry you guys who use, who speak languages with accents and stuff, you're left out here as always. And then if you want to have just any letter in general, lowercase or capital, you can have, you know, you can do this ugly looking thing. A to Z, capital and A to Z lowercase, that will match any of them. And in fact, maybe it's better. I guess some boomers make their URLs capital. And I think the internet works with that, I don't know. But so that will match any letter. Now, of course, we haven't done everything perfectly. You know, of course, you know, we're not matching this stuff at the end. I'm just gonna be lazy. I'm gonna say, okay, well, after that, we're just gonna match whatever. Okay, we'll just do capital S and then clean E star. You know, so it'll match whatever it follows after. That's not a perfect HT, that's not a perfect URL of regular expression. But that's just an example of one of the things you can do. You know, just to think through the process. I don't know if there is a perfect URL regular expression out there. I mean, there are a lot, because there are a lot of things obviously, some that don't include HTML or HTTPS or whatever. Anyway, that's another topic altogether. It's a learning exercise, who cares? So let's also try and make a regular expression really briefly for a mail address. That should be a little easier. Okay, so we can do this. Let's say it's gonna have some non-white space characters. Actually, I guess I should make that a plus. Then it's gonna have an ampersand, or an ampersand at sign, because that's how mail addresses work. And then you have more non-white space characters. Then you have a real period, which I'm putting a slash in front of it. See, look how confusing this looks, because there are a bunch of slashes, but it actually is not that difficult to look at. This basically means just anything, whatever, like any characters that aren't white space. This means literally what it is. This means more of anything, as long as they're not white space. And then slash period just means period. And then we'll do the A to Z thing. Okay, we'll just say A to Z, A to Z. And then plus for any positive number of those. Again, we'll run it on this file, and you'll see that, oh, look at that, our email matches. And I even tried to trick myself up by putting this stuff at the end, but we were smart and used only letters. If we had used something like capital S, that would have matched, but anyway. So that's an easier way to, that's an easy way to match URLs. And here were some non-urls. Lastly, oh yeah, here was some stuff about caps, right? I mean, you can figure these out, but let's say we want to match, let's just say something that begins, a line that begins with a capital letter. Okay, let's match all of those as an example. That's gonna give you all of the lines. Notice this, again, this thingy means beginning of a line. And then this means any capital letter. So if we run that, we will get all of the lines that start with a capital letter. I could also do something like this. Okay, let's say I wanna find all of the lines that start with a capital letter and end with a capital letter, okay? So I could do something like this. I could say, okay, starts with a capital letter, and then you just have whatever. I'm just gonna put period, cleanies star for just whatever. Then I'm gonna say, I want a capital letter at the end, so I'm gonna say capital letter, and then end of line. So if I run that, it's gonna match these weird lines that I put in that start with a capital letter and end with a capital letter. And they can have anything in between. We could put more junk in here, save the file, and run it, and you'll see that that will still match, okay? Now last and not least, matching numbers. So in addition to having A through Z, you can also do something like this. You can say zero, I'm not gonna put it in quotation marks for grep. You can say zero to nine, okay? RF, and that will match for letters. So you'll see that, or letters, digits, numbers. That's the thing I'm looking for, any number. And as you can guess, you can actually change this around. Let's say we don't want zero included. We just want one to nine. Well, in that case, it's gonna run this. It's gonna match, okay, it's gonna match all these, right? And it's not gonna match zero. Or let's say you just want five through nine, okay? It's only gonna match those. And again, it's not matching zero, so zero's and appearing as red. So that is another nice little thing just to be able to match some abstract sequence of stuff. Okay, so this video, we've gone over what I consider the basics of regular expressions. This is all the kind of stuff that I think you use, I don't know, like the list we made here, I think is pretty good. This is not everything you can do with regular expressions. You can do more complicated stuff. We haven't even talked about parentheses. We haven't talked about finding a certain number of matches and stuff like that. But I would say this is the basics. If you know this, go and use this for whatever you need to use it for. And then come back later, and this is basically your essential knowledge. This is like bare minimum of regular expressions. And you get like basically 95% of the way they are knowing just these. But that other 5%, that will be even better. But so that's about it, and I'll see you guys next time. Hope you learned something.