 Here we are the exciting world of regular expressions now we've seen a little bit of regular expressions when we use the search function and As you can hear the expressions are used to find patterns inside of text So you can import a bit of text and you can look for certain patterns which makes for wonderful Kind of data mining from text. A lot of things are possible Now Julia implements a pull compatible regular expressions. You can see there you can Google this look it up And the one syntax that I'm going to use is just this R and then the quotation marks Let's have a look So I say yeah, this is not swearing. I'm just going to show you I have this R in front of a String let's just execute that there we go It's there, but let's see what the type of this is and it is this rejects a regular expression So definitely a specific type inside of Julia Now let's have a look at this I'm going to have a string and it's called I love Julia And I'm just going to ask is they a match for the word Julia and I use this a regular expression So it's not going to do a normal search as we as we had up before Which it's going to do this Method of using this regular I shouldn't say the word method yet It's going to use this regular expression to look inside of that and indeed it is true that is there Now regular expressions are obviously case sensitive So if I had to if I had put a lowercase j there obviously that is going to be false Now let's go one step further than just Boolean returns So what I'm going to do in the next example because this is really what I want to get to I'm going to use split and match Combined now Let me not give it away. Let's have a look at what we're going to do I'm going to say text one equals and I have this ABC ABCDF ABCDF and then BCD BCD You can see what we have the place inside the computer variable called text one and there we have Now I'm going to use the split function on text one and I want it to be split on the spaces. Let's do that And indeed there we have it ABC ABCDF. We have all of them now look at this nice little for loop for me You can use the regular expressions on a for loop So I'm gonna say for I in split text one and Then the space so what I'm actually saying for I in this element So it's going to look at ABC then ABCDF then ABCDF GHI, etc It's gonna say is match the regular expression are ABC. So it's going to look for this ABC Just that ABC and it's gonna see does it match in there and Then this double ampersand sign that means if this was to then execute that if this first one was false Don't execute the second bit. That's what this double ampersand signs mean and then end Let's execute that and it's going to look down that list one by one because it's a for loop for I in this Split which gives me this array So the first one definitely there wasn't ABC and the second one there was an ABC and the third one there was an ABC The BCD had left out. So this bit returned false for that iteration. So it did not execute this Second part So it found all the ABCs now I can imagine how how powerful that can be if you imported a bit of text Then you were just looking for all certain words to bring them out all substrings of little bits of text quite powerful Now how would you go about? Expanding on this now, I'm gonna use this dot or full stop as a wild card Now let's have a look at this. I have text to and this is a sentence for this sentence I only want the words with a dot in them So a dot is this full stop. So I'm looking at the word finding the word sentence and finding the word them That could be a very nice thing to do because I might want to not split by word and the spaces between but I want to split by sentence So let me show you how that works Now again, we're just going to split. Let me just run that Again, we're going to split the text on the spaces just for now But you'll see where we'll eventually get to so I'm again gonna split it and I'm gonna find for this match this dot I'm just looking for words that in that have a dot in them But remember a dot is a wild card And I have to I don't want it at the moment. I just wanted to be a real dot So I have to escape the dot so escape dot means look for the real doc Don't use dot here as a wild card if this is true. It's going to print that line So as we said, there's only two words with a with a dot in them sentence and full stop now Let's Build some more regular expressions text 3 is this man can fan down with a pan while he ran away from danger Horrible horrible sentence as I said that my apologies, but for good reason Think what we're gonna find here. So again, I'm just gonna split the text into each and every word By this split function here Now look at this is match a regular expression and I put this PRD in Square brackets and then and what this means is Find any of these words attached any of these letters Characters I should say attached to these two so I can have pan or ran or Dan That's what it's saying. This is true print that line for me Let's see what happens and indeed it found pan ran and danger because there is a D. A. N in there danger So if you if you have different words, but they have a common bit to them This is the way that you would build it up. Say for instance, you want Not you wanted a result not to have those ends and not pan not ran and not then you just put this Power sign shift six on most keyboards in front of it. So now we're going to get back back man can and fan You see why I did that ugly sentence just so that I have all these a N words. Anyway, man can and fan Good now, let's use this wild card thing that I was talking about. I'm not gonna use dot in its real sense I'm gonna use it as a wild card So again, we just splitting We're just splitting and now regular expression and and a wild card It says any way where there's an a N and anything after that it doesn't care So let's just see what it returns the only thing it returned was danger because dangers The anything is the only one that had an N in and something that followed The A in all the others remember pan ran man can fan that nothing after the A in Except a space and the space doesn't count here So a N with that and the reason why the space is not counting remembers because we split it Splitted on the spaces. So we just have this array of single words Okay, let's use some ranges Anna and Barbara have a cat its name is Dan Let's have that as our text. We're gonna split it by the spaces So we're gonna have Anna and and Barbara and have an A and cat Etc and that cat obviously is gonna be cat comma and the it says have it's gonna have this apostrophe S because it's only splitting On the spaces and the last one's gonna be Dan dot Now I can say find this for me is match our A to C in Inside this brackets and they all uppercase So see what it's going to find for me anything that starts with an A or a B or a C capital letters Now let's change this to text 5 and I'm going to have Anna and Barbara have a black cat now I've introduced this black and this lowercase C for cat and What if I still wanted to include that cat, but it now has a lowercase C Well, I'll just say A to B and a C and it's got to start with these as you can see by that square brackets So now I'm going to catch a Lot more now at first glance This might be slightly confusing because it returned a B for black cat and that's lowercase as well But this is actually saying a to B and C And because that's a lowercase. It's doing the lowercase ones for those as well. So that's one that that can really catch you Just make sure you Recognize what is going on there if you wanted to use it Now let's use this long long long long long long long sentence, which will end soon. Thank goodness Now let's find words with at least two O's So again, I'm going to split this on the spaces and I'm going to use this way anything that has an O in it So there's no square brackets there. So this O can be anywhere and I want at least two of them Which I put in these curly braces So let's run that So it found Long with two two O's three O's one two three four O's and the word soon. It was all of them had two or more double O's in them and That long didn't count Now let's just for interstate. Let's just change this to long go. Wait. Let's put his O there What do you think is gonna happen now? Let's have a look because certainly they're two O's in there now Let's have a look and it's still not there because this says two of them in a row So certainly Finding two O's that are separate from each other is not caught by these curly braces Now, let's do this this long lesson which will end more or less soon There we go Now let's look at this. I want these double S's But I'm putting wild cards in front of them and behind them again I'm running down this array of words because I've split them just by the space. So let's look have a look at that So the only one that it found with a double S Was this lesson, but it didn't find that less because they had to be something after that and In the array. It was just less le SS with nothing after the S. This says this wild card says there must be something after this Let's have a look at using this plus and star sign What do you think is going to happen here? Well, let me just tell you the plus see as I've used it here a double O find for me anywhere to double O and a plus means at least one or more Whereas this asterisk refers to zero or more. So let's just have a look at that So look for me for any double O Somewhere where there's a double O or one or more of those so it found one And yet found one and two of them actually and they had found quite a few of them and there's another soon there So you can use that plus sign. Let's just do this with the SS and to see what happens It says zero or more Because now it's returning stuff with just a single S and then there's a single S There's a single S Because that is what the star means find something that is zero or more So it's going to find that s and then for this s zero or more Now it's exactly the same as if we have just done this all I wanted this to find me something with an estimate Let's look for numbers inside of text now here is one there was a significant difference between the two groups city 3 versus 44 p value was 0.34 Now this should say insignificant because Sydney, that's not a significant p value. Let's save that This was an insignificant there was an insignificant difference between the two groups 33 versus 44 p equals 0.34 now. Let's find Anything that has the number in it, but if we're going to split on the spaces We're certainly going to have this as an element in our array that with the comma as an element in array and all of that in in the array Okay, so let's have a look Let's try and catch this Catching anything with a digit in it and I'm going to use this wild card backslash D because I've got to use this backslash D It's a it's a wild card for digits. I can't just put D because then it's going to find It's going to find the letter D for me and indeed. That's what we have We have this open pivot to see city 3 the 40 44 with a common 0.0 34 with the closing parentheses Now I can do this as well It's going to achieve exactly the same thing find anything for me in this range with a 0 to 9 in it It's going to be exactly the same. We're going to do that Now let's just change this again to insignificant. That's horrible. Let's do that We have the sentence again but now We've changed it ever so slightly you'll see from the one at the top and I'm going to use this backslash w uppercase w and What that's going to do for me is to return these digits but Look at that the 44 is gone. So it's looking for something that has numbers in them, but These are the characters as well. So that is a wild card for alpha numeric Find me anything that contains alpha numeric value. So there's got to be both characters and numbers Now let's look at this substring. We're almost there Now there was no statistically significant difference the p value was 0.3. The difference was not statistically significant There was no significant difference. You can well imagine that you're reading a journal article import that text and This is a various ways of just use three here of writing Stating by the use of words that something was statistically insignificant And that's all you wanted to find in that text. Find me all the sentences that are Find me all the sentences that contain those words Now one way that we could split this up is by using this backslash got So that means this dot is not used as a wild card So just remember that and then something with a space I want something with a dot in a space that would be one way to split this up in sentences The only problem that you're going to have here is Over there. So it wants a dot in the space. So actually it's not gonna cause a problem So because there's a dot in a space. So it's going to split this sentence off there So let's do that and let's build up all these different ways So if you quickly paste through or run through an article you might Find all of these different ways to state things Sometimes it's even these capitals this NS So I'm gonna use fine for me no stat with a wild card That means anything that has no stat and anything after that or not stat or no Sig with a wild card not sig with a wild card or NS and if that's to print that line for me And indeed it found all three of those sentences. There was no statistically significant difference The difference was not Statistically significant. There was no significant difference So see how quickly you can find if you think how you can combine this with a p Value and these non-significant quickly go through an article You can find that having to read all the article the whole article you can quickly gather this data from a text file Just using regular expressions. So if that was an easy enough introduction to regular expressions They are very powerful played with them. They're quite a bit of fun