 All right, and we are going live. So I'd like to welcome everyone back to Sailor Academy's recap of CS105 Introduction to Python, Unit 8, Regular Expressions. As always, if you're live in the chat, feel free to leave a question. We'd like to answer those questions and have them live. If you don't have them live, feel free to leave a comment. We'll come back to it later if we have one. The course, again, you can find on Sailor Academy, Sailor.org. If you would like to know which program we're using, we're using a program called Replit. I will make sure that that is in the chat as soon as I hand it over to Dr. Sack. And I also add all of those links to the description for the final thing. So without further ado, Dr. Sack, let's learn, let's recap what we've learned with the Unit 8 Regular Expressions. Sounds great. Thanks so much. I appreciate it. Hi, everyone. And we're gonna start talking about regular expressions and their usage and all the syntax that goes along with it. I just wanted to maybe throw an example just as a motivation for string matching in general. So I'm gonna pop over here and share my replet window. And I just want to point out before we dive into this whole thing that it's important to be able to find the right tool and have options at your fingertips. So if you just had a little simple string and you wanted to start doing string searches, I mean, strings themselves have their own methods associated with them. And so you could use, for instance, a find method on strings that would look for a specific instance of a pattern that you're looking for. And in this case, this find method would return the instance, the index of the instance in the string where it finds the pattern. But if you were interested in more efficient searches and interested in finding potentially a lot of one thing, then you may want to extend your knowledge to more sophisticated techniques. And that's really what this unit is about. Because I mean, I think the concept of string searches is actually very, very deep. It's really deep. It's, you could delve into textbooks this thick on algorithms for string searches and string matching. And, but it's a question of what your application is and what you're after. And again, maybe at the introductory programming level, oftentimes students may not fully see the benefit of what we're about to go through today, but it's very, very deep. It's a really important subject and it opens the door to just a whole realm of computer science. Like you would never even think, it would just blow your mind away. It really would. Regular expressions are absolutely amazing. They're so fundamental to the theory of computation and they go so far beyond what we're gonna talk about today. But I just, if you're really, really interested in programming and really interested in computer science, the notion of the regular expression falls into probably the most fundamental form of what's called an automaton, a finite automaton. And one of the deepest questions you could ask in computer science is if I give you some form of, automatic computing machine or an automaton or what's called the state machine of some type, the one most basic questions you can ask is, what strings will this system recognize? What language will this system recognize? And I don't, there would be a conversation from the day to explain to you how that question relates to like everything in computer science. But all I'm saying is that at the heart of that question for what are called finite automata, the regular expression is one way of characterizing languages. And so you have, so regular languages are recognized by what are called finite automata. There's something called the Chomsky hierarchy. I don't wanna go too off on a tangent, but at the sort of level zero of the Chomsky hierarchy are regular expressions. They define the whole thing. So I just, I'm trying to just open you up to the fact that we're gonna talk about regular expressions, but there's a real depth to this. And there's also, because you may ask the question, as we start to get into some of the notation, for instance, well, where's this symbol coming from or why did they choose to use that symbol? And some of it comes right out of that. Some of the symbols that are used, some of the special characters that are used for regular expressions come straight out of the theory of the theory of computation. So again, I'm just trying to maybe expand your, and it's a big deal. I personally think, I was sitting there for 30 seconds, I don't wanna, somebody stopped me. But I just, we are, computation is kind of at a place right now where we are on the cost, but we are on like the wave of a whole new paradigm of the theory of computation, just in terms of parallel processing and what's called quantum computation. It's been about almost a hundred years since the original theory was formulated by Turing and so forth. And so we're just kind of in this place where I think it is good for computer scientists to really understand the theory of computation and how this new programming paradigm fits within that big piece. It's mind boggling to me that just a handful of people a hundred years ago, you could come up with a theory that is so useful, it's use isn't going away. It's just as important now. And so I'm just trying to say that regular expressions kind of are one piece of that large puzzle that are very important at this point in this transition. So okay, anyway, now I'm way off on a tangent. Let's get back to earth here. So you could do this and so, but if you want to go deeper again, the notion of a regular expression is just going to be a combination of characters that you're going to use to perform string searches with. And if I could just run over to, let's say the reference for the course because show you probably the most important tables that you want to pay attention to. One of the tables gives you a number of the special characters and their definitions that one can use in constructing regular expressions, which we'll do in a moment. And then also in addition to that, we have what are called character classes and these regular expressions, again, let's just start to look weird if you haven't seen them before, but you just have to kind of, it's almost like learning a new language. And to be frank with you, I'm a little rusty and I haven't done a lot of programming with regular expressions recently, but one major application that I got into a while back now was in just bioinformatics in terms of having genomes. And when you have DNA sequences that they're formed out of an alphabet of just four letters, A, G, C, T. So you could go to like the, what's called the NCBI or the NIH National Institute of Health NCBI website, and you can actually download whole genomes. And you can, if you have an interesting sequence that you're looking for, you can actually perform string searches on this text data. That's how it's stored, genomes are stored in the form of texts. And I worked with a bunch of people a while back and one guy by the name of David Schneider came up with an idea that we should go fishing for some text sequences using regular expressions. And it was a very successful idea. He had a really, it was impressive, it was a good idea. So, they really do have application in science and programming and so forth. But so for this, for the language that we're using, Python, you've learned about a couple of modules. There's the math module you've learned about, you've learned about the plotting, the matplotlib or matplotlib module. Now we're gonna use the RE module today, which gives you a host of methods that we're gonna talk about today to go around fishing and searching for stuff within strings. But again, this table is instrumental, I think, and you can really use it as a crutch to do string searches and so forth. So we'll pop back over to the replet window and maybe we'll just as a first example, simple example, just throw something like this out to the screen and like this. And so you can, hopefully I'm sharing that, double check it, I've been missing this a lot lately. And so you need to import the RE module and then I've got this list of names and so very common use of string searching. If you have a database of names and or some other information that you wanna search. In this case, what I'm doing is I'm not really creating a regular expression yet. I'm just trying to show you that you can create text patterns that could be pure text and we'll get to a regular expression in a moment. But it's a simple application of stuff that we've done already. You can go through every element on the list, which is the name of some person. And then with respect to the module, again, it's a typical Python kind of syntax is that with respect to the module, there's a match function. And the match function is going to attempt to match some pattern on the list. I'm sorry. So an element is pulled at a list called person. You're gonna search for the pattern within that particular person and then we'll print it out if we find it. And so the match function, the way it works is it looks for an occurrence of the pattern at the beginning of a string. So if we go ahead and run this, what'll happen is that we can kind of go in there and find that at the beginning of the string, there is a John. Now there are other Johns in here. There's a John here and a John here. But the match method is there to, if you're interested in finding a match at the beginning of a string. So no regular expressions yet. But now, what if we did something like this, okay? What if we decided to ask for a pattern like this? What does this even mean? Well, this little period here, this little dot, if we come back over to the table that we were just looking at a moment. And this is why it's so instrumental is that you can immediately see is that this stands for any single arbitrary character. So really what we're asking in disguise with our first experience with a regular expression is that we're creating a pattern. It's a string pattern. So it's in between quotes. And we're looking for an arbitrary, we don't care what character is first, but the second character has to be a U, okay? And so if we go ahead and run this, then we'll get two names. We'll get Julia and Suzanne because we don't care what the first character is, we just want it to find a U in the second character. So that's the most basic, that I can think of, the most basic form of a regular expression is just saying, you know, I want some arbitrary characters. And then, you know, there's gonna be something that's going to define the elements that you're looking for. So here's just another example of a simple pattern and oopsie-daisy, come back, sorry about that. Just do away with my screen. Good going, nice going. There we go, there we go. And so then, where are we? So that's the first example. And then let's play another game here. Let's try a different kind of a pattern and we'll do something like this. And so one of the questions that oftentimes comes up when you're first starting out is, since I've got this list of special characters, what happens when, what happens, there we go. So I've got this list of special characters. What happens when I want to use one of these special characters in the string that I'm searching for? Well, as always, you know, pretty much in any computer language, what you usually do is you put a little backslash in front of the character to alert the RE module that you're actually looking for that character. You're actually looking for a period. So this pattern is looking for an arbitrary character and followed by an arbitrary character, followed by a period. And if we go ahead and run this code, then we'll see while we were looking for something that had, you know, doctor, which or mister, you know, something like that. And so anything with two characters arbitrarily with a period afterwards, some type of an abbreviation. And so this is probably the simplest that I can think of the simplest application is, is you've got a list or you've got a database of some set of strings and you want to be able to efficiently search through that list for a match of some pattern. And as I said before, the most basic form of, the most basic pattern would be one where you're looking for arbitrary characters and then you sort of pin your search to some other character, you know, again, like the you or the period or something like that. So some of those basic searches. And of course, if you wanted to be more specific, you know, if you want to do something like this, what does this mean? What does this one mean? Well, let's try and analyze it. We already know we have an arbitrary character. What does the, what does the asterisk mean? Well, we'll come back to the table and we'll look at what the asterisk means. This is very, very, well, it's important. It's cool. It's mathematically cool. But anyway, so the asterisk stands for an arbitrarily many repetitions of a previous character, okay? But it also allows for the fact that there could be no occurrences. And I'll get into this, why it's so important to take that into account. You could end up with the empty case, basically, or an empty string type of a case. And that's what the asterisk does. And I'll compare and contrast this in a second against the plus sign. But so now what we're saying is, well, I'm going to have definitely one character and then an arbitrary repetition of any character, basically, until I run into the name John. So now this kind of a search, which is different, if you go ahead and think about it, this is different than what we did for searching for the name John when we used the match function again. So this is all with respect to using the match method and the match method always, it always begins from the beginning of the string. And so this is important. It's important to know what your toolbox contains. And in this case, we're saying, well, you can go arbitrarily deep into the string, basically, the name could be right at the beginning, like John, so there could be no characters, there could be no arbitrary repetition of a number of characters. And that's why you would use the asterisk because you'd be looking for the name John directly at the beginning. And if it's directly at the beginning, then there would be no characters in front of the name John. And so that's what the asterisk is doing. And I'll compare and contrast that against another special character. You'll see what I mean in a moment. But there are also other instances where John could be deeper into the string from the beginning of the string. And you can see these names. You've got Jennifer Johnson, you've got Papa John, you've got Walter John. All of these are finding that instance. And because of the fact that we've specified an arbitrary number of characters, okay? So another simple example. I haven't noticed, I haven't changed my search loop, okay? The loop doesn't change, the pattern can change, okay? And you can look for different patterns. So this code is very useful. It's very reusable. We're gonna be plopping it into functions. It's very reusable. It's a very common type of loop that you would write for string searches. And then all you have to do is change the pattern as the input. So I just wanna make that clear. Like, again, just trying to tie in to everything that we've done up until this point is that all the programming structures, the fours, the ifs, they all come together in a culminating kind of experience when you begin to actually apply them to search for stuff. That's a very common application of computers. If you could reduce a computer to probably, like it's five most fundamental operations that it's performing on any given day, it's usually doing some type of sort. It's usually doing some type of a search. Searching and sorting are probably two of the most fundamental kinds of operations that computers do. And so here's just another iteration on that theme another variation on that theme. Okay, so that's just another example. Let's go a little bit deeper into this now. Okay, let's go a little bit deeper and really start sort of stringing together if you will, a different kind of a pattern. So let's get rid of all this I think and I'll do something like this. I'll have a pattern and then we'll say this. So now what we're looking at is a combination of quite a bit of stuff here. And so we start out with the pattern and again, it's sort of like you wanna have, hopefully you've got a screen where you've got that table in front of you and I'll just kind of call out what we're doing here. But it's basically, I'm going to look for something that has an A in it, arbitrary number of characters. We've got a special character with a carrot and then on the class of characters, on the class of characters. So that's the second table that I was showing you. So the special characters and then the special classes, if you scroll down that page a little bit more and you'll see that there are classes where you might be looking for possibly decimal numbers and then you can look for things like white spaces. And then arbitrary repetitions. So be careful with this one because this is an arbitrary repetition of any letters from B to P in the English alphabet. So it's not really, why would you do something like this? Well, it's kind of irrelevant. I think that the thing is that I'm just trying to throw together more complicated expressions to see what we'll find if we run something like this, let's say. So now I'm going to kind of introducing two things at the same time with this example. So let's be careful about this. Now I'm not going to use the match function. I'm going to use the search function now which is going to search for instances that don't necessarily start at the beginning of the string. So now what'll happen is that the way the search works is that it's kind of like a sliding window of looking for that. You've got the pattern and then you've got the long string and you're going to sort of slide that string along for instances where you bump into the first instance of that find. And so just some sample strings here is that I've got something, I just decided to end these with something like XY, start the string with XYZs and then end them with some other stuff. What's important is what's happening here kind of in between is that this string starts out with an A. So this might be a candidate for what we're looking for. It might be and it might not be, let's see if it is. But it's a candidate that starts with an A. And if we compare, well, the next thing we have an arbitrary number of characters. We don't care what happens until we bump into the carrot. And the carrot is, it's a special character. And so that's why we have to put the slash in front of it because the slash says we're really, really looking for the actual text character, a carrot. So if you go ahead and look at the string, it starts out with an A. There's an arbitrary number of characters and then suddenly we bump into a carrot. So it looks like this might fit the bill, okay? But we're not done yet. This is just sort of half the regular expression. Then we're looking for a number that's a decimal. Then we're looking for a space basically. And then finally an arbitrary, possible arbitrary repetitions of the class of characters from B to P. Now we end with, this expression ends with the nine and then it ends with the white space, right? So it looks like it fits this characterization but remember the asterisk says that you don't have to have this happen. You can actually take into account the fact that there are zero repetitions of any characters between B and P. So this looks like it might work. Let's look at the next one. Let's look at the next one and see what happens. Well, we've got an A, we've got repetitions and we've got a whole bunch of stuff afterwards. Okay, so we've got a white space and then repetitions of any values between B and P. So this looks like it might work. And then a simpler example is when there's an A with no repetitions of any characters which is the asterisk here and then you get the carrot and then you get a decimal digit and then you get a space and then repetitions. So all of these pretty much look like they're going to fit the bill when it comes to that regular expression. And this is the accidentally moved something so I'm going to put that back. So this is the game that you would be playing. You'd have a large database or a long string. Again, when we were dealing with genomes you could have like millions of characters. This isn't something that you want to do by hand. The human genome has three trillion nucleotides. It's three trillion AGCTs in there. So that's a big search. So anyway, so you might go looking for a pattern like this in a very large database. And if we go ahead and do this we can see that they all fit the bill. And when you get the answer out from what the search gives you it gives you the location of where it actually found the match where it found the first match. So this case it span from position four to index 13 in the string. The next one was from seven to 19 and then from four to position 14. So there's a lot of information that you're given if you use something like the search method. And we'll talk about how to extract all that out in a second. I'm just trying to show you what stuff is available to us when we actually perform the search. And I want to compare and contrast this now against using the search method against using the plus sign, okay? So this is very important to distinguish between the plus and the asterisks because the plus what it does is it looks for multiple repetitions just like the asterisks does. The only difference is that you've got to have at least one occurrence, okay? You've got to have at least one occurrence. And amazingly, like this is really the notation that carries over from the theory of computation. It's cute, you know that the asterisks really has important meaning, you know, when it comes to defining certain types of set operations for languages. And then the plus sign also has, it's more of a dagger than a plus sign but the point is is that it really, this is the mathematical notation that's used. And so if we go ahead and run this one we can sort of compare and contrast again. You can see the difference between using the plus sign in some of these expressions because in this case, we were saying, well, there has to be at least one occurrence of the class of characters from B to P in the English alphabet, but there are none. So in this case, although this pattern was found using this regular expression, okay? Where we said we can have arbitrary or zero repetitions and therefore the pattern was found. When we come down to the plus sign we have to have at least one repetition and there are no repetitions in this string. And hence when you see the result, we get none. It didn't find anything. Meanwhile, the other ones work the same because there are repetitions at the end of the string here of characters from B to P and characters from B to P and it doesn't, again, it's arbitrary characters. It doesn't have to come in any specific order. It's position by position. So it's just any arbitrary, you know it's not alphabetical is what I'm saying is that, you know, it could be in any order whatsoever. You know, you could put a P here and it would still work. So this isn't saying that it has to be in any alphabetical order. It's just saying there's an arbitrary character between B and P in the alphabet. And in the next position there could also be an arbitrary character and the next there could be an arbitrary character and so forth as many as you like to end the string in any order, in any order whatsoever. But like I said, the difference between the plus, I know it seems like such a subtle little thing but it's, there's an enormous universe between, there's an enormous mathematical universe between those two expressions. So, but again, I don't wanna go too far on a tangent, let's just stay on earth here. And so anyway, so that's the major thing that I wanted to point out was the difference between the plus and the S's. Okay, so then, you know, you can look at things like this. This is kind of, you know, starts to get the flavor. Once you start to get the flavor of how this works, you know, just throwing some ideas for test patterns if you were looking for test patterns is now you see the difference between the star and the asterisks, I'm sorry, the asterisks and the plus sign. And so now you should be able to kind of look at this. And if we kind of go back, well, this is just an, you know, an A with a B, but the B is with respect to the asterisk. So it could have zero or many repetitions of B. This could be A followed by a B, at least one repetition, okay. And then here you can, the question mark is just zero or only zero or one B, or you could define how many Bs if you use it, if you use the squirrely brackets, or you could say, well, I want, you know, between this many and this many. So the regular expression is what I'm trying to say is, and again, we can't really go, in one hour, you know, we could just do so much, but I'm trying to say that pretty much, you know, any form of any string that you could be looking for, it's possible to construct a regular expression for any case whatsoever, you know, and that's really, the most basic form, like I said, would be as we started out at the beginning, you can see we've gone miles now from just, you know, looking for a J-O-H-N as a pattern, you know, to just a collection of special characters for, you know, searching, you know, large strings or databases. So just trying to summarize some of the things that you might want to do if you ever actually use these with these different test patterns. Well, what happens after that? What do we do next? And the next thing to do is to recognize that, let me try something like this, is to recognize that Python is very, very flexible. It's very powerful. And the, again, the algorithms for string searching can be very deep and they can be very subtle. But usually one of the first things that you want to do is if you are performing string searching, is you wanna break your pattern down to some type of binary format that's very fast for searching. And the other thing that you wanna do is because Python is inherently an object-oriented programming language, is that you want to be able to create objects that can access all the methods that you would want to access, you know, using the RE module. So those will be two major things that you would want to happen if you're really going to be creating patterns. And so the compile method is extremely important for those two applications. The compile method gives you an efficient representation for the pattern that's very fast, kind of underneath the hood algorithmically, and also in addition to that, when you compile the pattern, the pattern that you create is endowed with all the methods that you would have access to for performing string searches. So this example, now I'm trying to mix together a lot at the same time, but you're all, this is unit eight, so it's okay for me to do this, but I'm creating a function and I'm calling it findName and I'm sending a list along with, you know, some pattern that I'd be looking for. And within the function itself, I'm actually compiling the pattern. I'm calling this variable RE underscore pattern. It actually is, this is an object instantiation and therefore the object, again, is going to have to be endowed with all the methods that you could use within the RE module. And so RE pattern dot match is right here. We're actually, so notice this syntax is different, okay, than the previous examples that I've been showing you. The other ones, you know, just in a similar way to how we've taught you with different modules, right? Like the math module, for instance, and you can import, you can import, you know, for instance, like a square root function directly from the math module. But if you don't do that and you just import the math module by itself, then your syntax would be like math dot square root of something. And so the method call, if you do nothing else and the absence of doing anything else is that the method call is with respect to the module. So you'll see everything that we've done up until this minute, we've been making, you know, the match or the search calls with respect to the RE module. But now what we're doing is we're actually creating objects that have been compiled within the RE module and we have instantiated those objects and then we have access to methods like match and so forth, which is exactly what you're seeing right here. So now you'll notice it's with respect to the object RE underscore pattern. And it's a simple, you know, very simple code is just here's the name list. We're looking for the name George and then we're gonna return whether or not we actually found it or not. Obviously George isn't on the list. I think you all can see that's how we'll return a false. But once again, the technique of having a list of stuff being able to cycle and iterate through it and this one extra step now of that you may or may not want to use. Again, you don't really, you know, if you have a small list of stuff you really don't need to necessarily compile it. But again, if you're talking like really, you know, large database it's probably sensible to compile it because you want the string searches to go as fast as possible. And in the compiled form, they just tend to be more, they're more efficient with regard to, I mean, at some point it's gonna get compiled down but I'm just saying if you do it ahead of time this will just be a time saver and a step saver within the actual match, you know, module. So, but, you know, for the rest of the, however much time we have left you may see me not use the compile because again, we're just doing small onesie twosie kinds of examples. You're not gonna really see a speed difference if I use it versus me not using it. So on a list this short, okay. So, but I wanted to bring it up. It had to be said because you'll definitely see some code in the course where the patterns are being compiled, okay. All right. Let's go on to the more important topic of finding a lot of stuff, okay. Finding a lot of stuff. And so now we wanna, we're interested in multiple matches. And so there's a nice, there's a nice function or method rather called find all. And so find all is gonna return all the instances of substrings that match. And then what we're gonna do is we're going to print them out. So in some sense, this is an interesting method because it's find all is gonna find everybody. And what we're allowed to do when we find all is we can iterate on the find all. This is a very powerful statement. It's, you know, this is where Python I think, you know, is just so powerful because somehow or another underneath the hood, what's happening is that find all is finding all those instances. And then you can just, you know, in a way that's similar to like a list comprehension type of a statement, you basically get to iterate on all of those matches and you just throw them all out at once. So this is, you know, there are other languages where you'd have to write a lot more code to do this. And suddenly Python makes it a one, depending on the language, I should say not all languages, but the point is that you get them all basically, you find all the instances and the find all is non-overlapping by the way. So it's gonna look for non-overlapping instances. But I just wanted to show you that you can iterate on those instances and then print them out. Now, and so it's hard for me to predict like what your application would be, but this may not be the thing that you want. I'm just trying to let you know that this is a good method if you're just interested in finding instances of those strings. But sometimes that's not enough information. You know, you just maybe don't want to, you don't want like a yes or no, hey, I found an instance of this string. You know, oftentimes you're interested in where it's located. You want more information. You're interested in more than just a yes or no type of answer. And the find all is more of like a yes or no kind of an answer. If you want more than that, then you can use what's called the find iter method. And so something like this gives you a little bit more information. So for instance, now, you know, these loops look very similar to both for loops. We're going to, it's implied on the matches that are returned from the find iter. It gives you an iterator basically. And, but now the matches have more information associated with them. More information in terms of where the instance starts and then where the instance ends. And so now what you can do is you can use these indices to index into the original text if you wanted to do that or if you just wanted to know where they were, you could just have those values. So you can see that the information is, it's deeper. It's more potentially more revealing. But again, I'm not going to say, you know, which is better in any, it depends on the application. Sometimes you just want a yes or no. Yeah, I found this, you know. And then sometimes that's not enough. I feel like, well, I'd like to know where it is. So that's the major difference I would say between the find all and the find iter. So we'll play around with some applications of these as well, let's just play some games down here now. So let's play, that's our little sandbox. That's kind of what we can look for and we'll just do some practice. I think that those methods, you know, the search, the match, you know, the find all, the find the iter, the RE module, the compile, that's a good package to start with. You can really, really do a lot if you, you know, if you start to master the table of special characters and the table of classes. So let's play around here. We've got this little function called find iter. And if it finds a match, I'm just gonna print out where it actually finds the match. So what do we call, so RE underscore pattern now is a function that I'm writing. This isn't, you know, it's not an object, it's not, because I think I've used this, this syntax a couple of times. But in this example, you know, what it means is, it means it's a function, I'm calling a function. And the first input variable is the text that I want to search. And then the second is the regular expression I'm searching through. And I just wanted to, again, bring up the use of special characters. So what this dollar sign here, it means if you go ahead and check the table, if you've got that, if you're following along and you've got that in front of you, it means that this instance has to occur at the end of the string. Okay, it's gotta occur at the end. And so that's what the dollar sign means. So there are lots of instances, you know, where I have a one, two, threes, okay, but I want a special one. I want the one that occurs at the end. And then you might ask, well, okay, well, you know, a common question then is that, what if my string had a dollar sign in it? What would I do under those circumstances? And then what you would do is you'd put the slash in front of the dollar sign. If you were really looking for that, you know, let's say if you wanted to have a dollar sign at the end of the string. So if we go ahead and run these, they're found both at, and so you can count. Once again, the indices start at zero. So this is zero, one, zero starting at zero, one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, okay? And that's exactly what I found, 12, 13, 14, and then 15, okay? And then in addition to that as well, now you have 12 through 17. So that's a, I think the main thing that I wanted to point out is the fact that if you, again, it's a very common question. If you want to use special characters, what do you do? Well, you gotta have to put that back slash in, okay? So again, so a very, very typical piece of code that, you know, is very useful. You know, you would want to talk something, a very simple function. You'd want to talk something like this away. You know, this would be useful to have if you were writing some, and we'll get into object oriented programming like in unit 10, but if you have some suite of methods, you create a class, you know, for some series of string operations. And so this type of a loop would be very, very typical as part of that suite of methods that you would be using for, you know, some type of string processing application, okay? So let's play around with another one, another kind of in our little sandbox now now that you're all regular expression experts. You can ask what something like this does, for instance. Excuse me. So again, we, you know, just to have this function, RE pattern, it's using find iter again, which is gonna, it's gonna find all the matches for the pattern and it's gonna find the starting point and the ending point and then we'll go ahead and slice those out. So now, again, just a review of slicing right here and just to point out, you know, in terms of the index, the endpoint that, remember the way that you use slicing is that you count up to the given value and then you go sort of one less than that, okay? So the endpoint is actually going to be, because it's realized that you're probably intending to, you know, to use the endpoint, the answer comes out like one more than what you need it to be so that you can actually slice with these values, you know, otherwise you'd have to put like, you know, slice to E plus one or E minus one depending. So, you know, to pay attention to what, you know, to what the S and the E are being produced by the match method, okay? And again, it's really nice. So when you get the match, you don't just get, you just don't, it's not like you get an element off of a list. I mean, you match is really an object that has methods, you know, associated with it and it's generated by the find iter. It's really, it's a valid RE module object. And that may not mean a lot to you now, but by unit 10 it will mean something to you. You just want to be careful. You want to understand what's being generated by the RE.find iter, okay? So again, this is why Python can be so powerful is because like when we first started writing loops, we saw that we could iterate on list for, on values, on lists for instance, or we could iterate on values, you know, on keys in a dictionary. And that theme just keeps on building and building and building and building. So as long as Python understands, you know, what you want to iterate on, what is your, what is your iterator? Okay, if I could put it that way, then everything still works as long as that's kind of underneath the hood. In this case, like it's just, it's wonderful that match has, it's endowed with all the stuff that you would want it to be able to do if it was being generated by a find function. So you have access to these values straight from the object that's generated by the iterator, which is really cute. You know, it's nice. It's, it's, it does exactly what you would want it to do. Okay, anyway, so, so now we can, well, what does this mean? Well, what this means is that we're looking for the occurrence of a decimal value, okay? And it has to happen at least, at least arbitrary repetitions, but it has to happen at least once. Okay, and that's what we're doing here. So, so again, this is sort of like a trick question. You know, sometimes I'll test students on, you know, not because I'm trying to make them do poorly. I just want to make sure that they understand the difference between looking for a D versus what backslash D means, okay? So backslash D, you've got to go to the table of special classes, of character classes. And on that class, you'll think, you'll see things like, well, W's are for, I mean, S's are for white spaces, D's are for decimals, W's are for alphanumeric. And so there's a number of different options that we have to master. So in this case, we're not looking for a letter D. And that's why I put the example in there like that, is that you want to distinguish between looking for the actual character versus looking for a class of characters. This is arbitrary repetitions of decimal numbers. So when we run this, then you'll see that we find one, two, three, one, two, three, one, two, three, like a wall, it's like one, two, three, one, two, three, one, two, three, one, two, three. So again, just a simple example, but in our sandbox, you've got to be clear about what you're looking, what you're searching for. Okay, let's try another one. And this is also a good one. So again, just to kind of mix things up a little bit, just to make sure that we are distinguishing between the text and the pattern that we're looking for. In this case, throwing things like dollar signs and carrots, these are special characters, but they're not part of the regular expression, they're part of the string, which is completely fine. It's not gonna mess anything up. So what is this little backslash w doing? Well, again, you've gotta be looking at that page, scrolling down to their character classes, and you'll see that the backslash w means it's looking for alpha numeric. And the beauty of the way that the expressions are created is that it allows you to either look for the instance or look for the complement of that instance. So you can use either, like in this case, a backslash w or a capital backslash w, which says that you'd be looking for non-alpha numeric character. So in this case, we're looking for alpha numeric characters. And so you'd find the A, hopefully we can find the B, the C, and the D, if we're really performing the, if find iter is doing what it's doing. So again, in this case, the goal is to find multiple instances, multiple matches. And that's what we need, that's what we need the iterator. And so we find all these, we find the A, we sort of don't care about these because they're not, so alpha numeric is anything that's either in the alphabet or it's a number, numbers digits zero through nine in text or A to Z, either capital or lower case. So we don't care, and as long as it falls into that set, we wanna find anything that's alpha numeric and forget about special characters. So, or if we go the opposite route, then what happens is now, and like I said, this is the nice thing about a well-constructed suite of regular expression classes is that the compliment of all alpha numeric characters would be anything that's not an alpha, anything that's not an alpha numeric character. In this case, non-alpha numerics would be dollar signs, carrots, anything that you would sort of shift number, but anything that's not a letter or a number basically. In this case, now we find all of those instances. And again, if you wanted to, you could modify the code. If you don't just want the text, you could ask for, and again, this is just very base low level. I'm not really formatting anything. If you get into the course, you'll see there's some nice formatting instructions that they give and so forth. I'm just trying to really cut to the chase. And so this is just in case you're not familiar with formatting strings and so forth. So, but you should be, just kidding. So, but anyway, so I'm just throwing bare-bones stuff out to the screen, which is the text and the indices at which they occur. And again, so this is all an agreement that the dollar sign is that location one, ending at location two, location two, the next carrot is at two to three, so on and so forth. So again, this is probably the most fundamental information, set of information that you would want would be either the text or the indices where they're occurring within the string, especially if you're looking for patterns within large databases and so forth. Okay, let's play some more, we could just go on forever. Let's try and mix together more, interesting iterations or variations on the theme. Just trying to mix together now stuff that we've done previous units. For instance, make believe that you want to do the fine, but now you actually want to remember, rather than just print them out, you actually want to remember them. Well, how would you remember them? Well, interestingly enough, there's a lot of different ways that you could answer that question. Again, I'm just trying to keep it kind of basic where we're going to say, well, I'll have this list of indices and I'm going to build up the list just to remind you of list appending to store the indices and then return what that list of indices are. So that might be one thing that you want to do, rather than just print them out to the screen, which again, kind of attempting to cut to the chase. Just grab some of these. Our goal isn't always just to kind of print stuff out. Our goal is to write programs that utilize this information in some way. So in this case, the function would build up the occurrence of these indices and then save them on the list and then we could deposit them in some variable X. So again, now RE pattern is going to be a function again that we're calling and we've got this long string. Let's analyze this one. I mean, you're all ordained experts now on how to find stuff. And so what this is saying is that, okay, well, I'm going to look for anything that starts from A to Z, okay? That's the occurrence there. And then arbitrary numbers of at least one repetition of lowercase. So I've got uppercase and then I've got arbitrary occurrences of at least one occurrence of lowercase letters. Here, we're going to have CD, but remember the question mark, if you go back to the list as saying, well, zero or one occurrence or two or three occurrences and so on and so forth. So if we go ahead and run these, you can have some occurrences and you find out or no occurrences. So here, let's say on the first one that we're looking for, there were, here's one uppercase, two uppercase, three uppercase, starting with M, starting with L and starting with A. And that's exactly what you get. You get this starting with M, this is the one that starts with L and that's the one that starts with A. So this really is a typical kind of, this really is typical of a search that you would want to, if you were looking for proper names or something like that, that this would fit the bill for something like this. And then just other games to play around with for the question mark and so forth, that if we come back over and look at our list, so zero or one occurrences of the previous character or group, so coming back over here, so now we want to have zero or one occurrence of the group. So if we do look at the, where are we here, second one and we can see these indices, we can see that we have the CD, okay. So you're gonna start with a C and you have to have zero or one occurrences of D. So in this case, since it would be the CD is at the beginning of the string. So here's the CD and then you kind of move along to the next occurrence of the C with any arbitrary numbers of Ds. And then finally, since you've got a C at the end, you're gonna get that one as well because there are no occurrences of Ds under those circumstances. So I don't know kind of what else to say. I know what I wanted to go over today was basically to try and introduce the RE module and then throw out maybe four or five methods along with their applications so that you'd have some set of initial functions that you could use or methods to search with. And then the only thing I think I wanted to do was just exercise different themes and variations and trying to show you that you could have different levels of complexity in what you're searching for. Anything from an arbitrary character with a single character all the way to expressions that could get more involved like something like this or like some of the other ones that we saw. So as I said before, I think that the table is instrumental that you can use that as a crutch when you first start out. You go back and forth and kind of look at it but and don't underestimate the power of the character classes because you can grab whole character classes with one single backslash like either with a backslash D or a backslash W that those are very, very powerful because they get you whole classes. And once again, with things like the left bracket, right bracket, extremely powerful, extremely powerful because you can specify the set of characters that you would want to include or look for. So this is, I mean, this is really powerful. This is like one position that I can look for anything from A to Z that would be a capital letter. And again, if you actually program these things you'd be amazed. It's nice that you could have 26 possibilities that you're testing for one position at a time. You should really wonder about how that's actually done because there's a depth to, as I said before, if you ever really get into string searching, you could find very thick textbooks on the theory behind string searches and how they're actually performed. And then we, at the high level language level when you're first starting out, you just press a button and your left bracket, right bracket, A to Z, go get it done, but if you're, as I said before at the beginning, if you're really interested in the theory of computation, if you're really interested in algorithms as well, I mean, string searches are just this enormous universe of stuff where, again, you're just sort of taking for granted that you could put a little left bracket, right bracket, but there's a depth to it, you know? So I would just encourage you, if you're gonna go on, understand, let's say, data structures and algorithms for string searches, and which is a beautiful subject and the theory of computation just in terms of how regular expression.