 Okay welcome back. Today we're going to be continuing with our shell tutorials. We're going to be looking at awk today to split apart a file based on matches, on some strings you give it. So real quick I've got a file called pages.txt here and you can see I have a line that says page one and then it has a few lines then page two and a few lines and we're going to go page three and a few lines. In fact I'm going to real quick change this. This is a line. I'll change that back in a moment to show you why I have that there. But real quick this is how the command is going to work. We're going to use the awk command which is going to be on pretty much every system. If not it's very easy to get. In fact I think it's at least some version of it's built into a busy box. So even if you're running on a router you should be able to do this. But what we're going to do here is we are going to say awk and then have something inside these quotation marks which we'll get to in a minute and the name of the file that we're going to be searching through. Going back to here I got two forward slashes. In between them I'm going to give the string that I want to split. Every time I find a match of that it's going to split it into a new file. And I'm going to say page because each page has a line at the beginning. It says the word page in it. Next we're going to inside curly braces say n plus plus. This is going to be a variable that is going to every time it finds a match it's going to add one to that number. That allows us to number the output files which we're about to create here. So inside another set of curly braces and I'm going to try to put all this in a link in the description. I'm going to say print. So printing the matches that it finds as it splits it out and then we're going to redirect that into a file. And now if we just gave it a file name like output.txt it would save the first page and then override that with the second page and then override that with third page so forth and so on. So what we're going to actually do is use that variable we created. And so we're just going to throw the other there. So now it's going to find the first page the first match and the lines that go with that and put it in a file called output one dot text. Then it's going to find the second match and put all those lines in the file called output two dot text. So forth and so on. It's going to add and add and add to that. So let's go ahead and write that hit enter. If I wrote everything right. No errors. Let's list out our files now and as you can see we have our original file called pages dot text but we have output one output two output three text. So let's go ahead and cat each one of those out output one has everything page one output two has everything from page two and output three has everything from page three. Let's go back in. Let's let's remove all those files. Even if we didn't we would override the next time we ran that let's go ahead and remove them just to clear things up. So now we just have our pages dot text. Let's go into that and change this part back to say page. This will be a problem. So we're going to run our command again. But this time instead of getting three output files we have four and the first one here just has the line page one where page two output two has the lines for it. And that's because it's looking for a match of page which is on this line. So that's problematic. So you need to think as a programmer what is really going to be unique about the lines that I want. So what you could do is you could let's go ahead and remove all our output files again go back into our don't need to go back into but we'll go back to our command and instead of page I'm going to put a little carrot symbol page and that's seeing that means find any line that begins with the word page. We'll go ahead and enter if we list out now. Now you can see we have output one output two output three just as we did the first time and they do display properly. Let's go ahead and remove all those and change our initial file again. And this time add here page of a line whatever. So giving a line that begins with page that isn't a new page. Let's go ahead and run our awk command again and as you might have figured out we have that same problem where it's going to be splitting at that line because that's a line that begins with the word page. So I can count out this page and it doesn't get all the lines and if I go to output two it starts in the middle of page one. So how can we fix this again we need to think as much as we can as precise as we can the the um what's as unique as possible as we can for our new pages. So what we could do let's go ahead again remove our output files and go back to our awk command what we could do is this should work and now we list and you can see that we have our output files and if I cat those out output one looks correct output two output three. So what I'm saying there is look for any line that begins with the word page space and the number sign. You still might have a problem if that appears somewhere in your text but at that point you've pretty much run out of options and it just comes down to proper formatting of your file. So that's something to think about as well when you're creating documents you want to format them in a way that makes sense. But in this particular case this is the best option we can come up with but again as a programmer you need to think if you if you're one creating document definitely think about how it's formatted. Put make you know page identifiers any other thing that's unique titles or whatnot make them unique in some way that is going to be different than other parts of the document. If you're working with just text that you've pulled out of somewhere you got to figure out what works best for your your option. I guess at this point you could say you could say page space number and a numeric number but that might show up in the text too why would they have a number symbol there if the next thing isn't a number. You could also limit maybe search for a you know number of words in the line if there's more than than two you know more than one space in that first line then you can ignore that as well but that's getting more complex than what we're talking about in this tutorial. I hope you did find this useful again I'll try to remember put a link in the description to the examples oh I want to give you another example that I'll put in the description let's go ahead and read remove our outputs but this is an actual text file that was generated based on file names I forget what it was for it was a long time ago this is just in my notes but I'm going to go ahead and download this from Payspin and again this will be in the the notes in the description of this video. I was going to download this file and I named it original.list and there you go we have this file let's go into it and it has a title for each section and each section has a different title and they all have semi colons in them or not semi colons colons in them and then a list of file names and again I so long ago that I generated this list I'm assuming this number of pages in the file and you can see we just have a few different sections there's a really long section here and then another section so we have at least three sections here final no an exam let's go ahead and see if we can split these up again using that same awk command and instead of page we know that in this particular file which is unique about each of those lines because they have different titles is that they have a colon in them so let's go ahead and run that list out we have oh I did the wrong file let's go ahead and remove our output I'm like that didn't work right it's because I was looking at my pages file which doesn't have any colons original list now we have three outputs one with each one with its own section in it so there's that final section that would be the no section maybe short for number and three again these these tutorials I'm doing now are tutorials I've been wanting to do for a while and they've been in my notes so I don't even remember the original but this is an actual example where I use this command to divide things up for some sort of use anyway I think you're watching again please visit filmsbychrist.com that's Chris the K there should be a link in the description also link to my Patreon page patreon.com forward slash melix 1000 I appreciate your support you can also support support me uh through PayPal just go to filmsbychrist.com again link in the description go to the support section for both those links and as always I hope that you have a great day