 One critical task when you're coding in data science is to be able to find the things that you're looking for. And reg X, which is short for regular expressions, is a wonderful way to do that. You can think of it as the supercharged method for finding needles in haystacks. Now, reg X tends to look a little cryptic. So for instance, here's an example of something that's designed to determine whether something is a valid email address. And it specifies what can go on the beginning, you have the at sign in the middle, then you've got a certain number of numbers and letters, and then you have to have a dot something at the end. And so this is a special kind of code for indicating what can go where. Now, regular expressions or reg X are really a form of pattern matching in text. And it's a way of specifying exactly what needs to be where what can vary and how much it can vary. You can write both specific patterns, say I only want a one letter variation here, or very general like the email validator that I showed you. And the idea here is you can write this search pattern, your little wild card thing, you can find the data. And then once you identify those cases, then you can export them into another program for analysis. So here's a short example of how it can work. What I've done is I've taken some text documents, they're actually the text to Emma and to Pygmalion two books I got off of Project Gutenberg. And this is the command grep carrot l dot ve space asterisk dot txt. So when I'm looking for our lines in either of these books that start with L, then they have one character can be whatever. And then that's followed by ve. And then the dot txt means search for all of the text files in that particular folder. And what it found is lines that began with love and lived and lovely, and so on. Now in terms of the actual nuts and bolts of regular expressions, there are some certain elements. There are literals. And those are things that mean exactly what they are. You type the letter L, you're looking for the letter L. There are also metacharacters, which specify, for instance, things needs to go here, they they're characters, but they actually really code that give representations. There are also escape sequences, which are something they use to say, Well, normally this character is used as a variable, but I actually want to really look for a period as opposed to a placeholder. Then you have the entire search expression that you create. And then you have the target string, the thing that it's searching through. So let me give a few very short examples. This is the carrot. It's the sometimes called a hat or in French a circle flex. And what that means is you're looking for something that's at the beginning of the text that you're searching. So for example, you can have carrot and capital M, that means you need something that begins with a capital M. So, for instance, the word Mac true, it will find that. But if you have iMac, there's a capital M, but it's not the first thing. So that'll be false. It won't find that. The dollar sign means you're looking for something that is at the end of the string. So for example, ing and then dollar sign, that'll find the word fling, because it ends with ing, but it won't find the word flings because it actually ends with an s. And then the dot, the period simply means we're looking for one letter and it can be anything. So for example, you can write a t period. And that will find data because it has an a a t and then one letter after it. But it won't find flat because flat doesn't have anything after the a t. And so these are extremely simple examples of how it can work. Obviously, it gets more complicated. And the real power is when you start combining these bits and elements. Now, one interesting thing about this is you can actually treat this as a game. I love this website. It's called regx golf. And it's at regx.alf.nu. And what it does is it brings up lists of words, two columns, and your job is to write a regular expression in the top that matches all the words on the left column. And none of the words on the right. And that uses the fewest characters possible. You get a score. And it's a great way of learning how to do regular expressions and learning how to search in a way that's going to get you the data that you need for your projects. So in some regx or regular expressions help you find the right data for your project. They're very powerful. And they're very flexible. Now, on the other hand, they are cryptic, at least when you first look at them. But at the same time, it's like a puzzle. And it can be a lot of fun if you practice it. And you see how you can find what you need.