 Hello and welcome, today we're going to review what we did the other day but in doing it a different way. So the other day I showed you how to pull down all the files from a website, an HAR file which is a HTTP archive file and then extract the base 64, convert it, get the images you want, put them into a PDF and we did that for your book from Classmates.com. Now if you go to Classmates.com you can create an account, you can go in there and if you go to your book that they have scanned, that people have scanned and uploaded, you can try right clicking and you can't get a page. What I should have shown you in the first video is again, I'm in Chrome but most web browsers have a developer's panel. Usually it's F12, if you don't have an F12 in Chrome, I control shift I to open that up. We're going to go to the networks tab and then we'll refresh this page. I also have it clicked on images so it's only going to list images down here but any image that loads on this page will appear down here and as I click through here and more pages load you can see them loading down here. I can right or I can left click on here, I can see a preview of that and if I want that image now I can right click and say save that image. If you only need one image from the page that's fine, again they must have an invisible layer or something over here that prevents you from clicking on this image or something in JavaScript that prevents you but it's still loading down here. Now in the last video I showed you how to again pull everything down in HRA file which is great. It makes it very simple if you have a page you have to log into to access the stuff. I noticed if I right click this and I say copy image URL and then I open up an incognito incognito mode or a private tab which in Chrome is control shift N and of course I'm going to have to do this and now I paste in that URL I can load that image. That means we can download these images without having to go through credentials and login which you can do from the shell. There's different ways you know curl and W get both allow for login information but we're trying to keep this simple and that makes it very simple that I don't have to log in if I have direct link to this image and I don't know how long that link if it expires at some point but if I was to go into a folder such as this say W get put in that URL for that image and download it you can see that it appears here I can open that up and there's that image right there downloaded. You notice that it does give that full name after that so there's more after the dot jpeg but we're not going to worry about that right now we're going to move that with a script we still need to get a list of all of these so I'm just going to move that to trash for now. So again I'm in a empty directory here this is my shell up here this is the same directory down here just so you can see as things load going to go back to my web browser and just like last time again you can fully automate this for sure but we're going to just assume that you're going to only download a few of these so writing out a full script to automate the entire thing is overkill again if you're going to be downloading hundreds of these yearbooks you're going to want to fully automate the system but all I have to do now just like last time is I can go through here and I can click through and just allow each page load and as I flip you through each page you'll notice down in the developers tab on my network tab here you can see the pages loading that's right down here okay so as I go you can see them loading and again these yearbooks are scanned by users so it even has signatures and messages from the people in the class who signed their yearbooks which is just adds to the coolness of it if you ask me we're going to go all the way through looks like we're getting to the end I could go faster but I just want to make sure every page loads like that page almost didn't load and because it loaded a little late there's a possibility when I download these inverts PDF it might end up in the wrong order it might be out of order when we get to that page and of course we could write our script to look out for that make sure that page loaded okay and we're at the end now now last time we right-click and we said save all as HR and that will save all these images and everything that's loaded into one large file we're not going to do that this time we're going to go copy and we could do copy as curl but these are binary things and it's going to give you they'll give you a list of curl commands download everything but the binary stuff is such as images is not going to load properly so what we're going to do we're gonna say copy all as HR but that's not with content so I copied that it's in my clipboard now now I can open up a text editor and put that into a text file I'm just going to use cat I'm going to cat and put it into a file called one dot HR name the file does not matter now I can control shift V to paste that it'll take a minute to paste all in there once that is pasted all I have to do is hit control D and it will be now saved to that file so I'm basically just catting all this information that I'm pasting into the shell here it is a lot and now I said last time we had all the images as base 64 we had to find that information pull it out but what I can do now is I can say grep jpeg and I can grep that file for jpegs and you can see a list of them here and these look like the right ones we can see here that it says yearbook and we can see a page number but just to make sure you know we're not grabbing other jpegs I'm now going to grep that again just say yearbook your books and hopefully now I can double check but I'm just going to assume that these images that I'm grabbing now are URLs just two pages in the yearbook so at this point I want to cut out the excess information you can see that this is we have quotation marks here so I'm just say this is column one over here two three and this will be column four if we divide this up by quotation marks so we will say cut with a delimiter of quotation marks field four and when we do that we will get a list of all the images we can now pipe this into W get but again we would get all these question marks at the end there I'm sure is that clean easy way to remove all those I'm just going to renumber all the pages so what I'm going to do is here is I'm going to say let X equal 1000 and what that's saying is I'm creating a variable called X and that's an integer it's a number a whole number and now I can say W get or I want to do while loop while and there's actually other ways to do this that I've shown in the past where you can run things in parallel so it's downloading more than one image at a time which is something you might want to look into but I'm going to say read and I'll say URL then I'll say do W get dollar sign URL and I'll explain this in a moment dash capital O for W get most programs it's lower case O W get it's capital O if you don't do that it's going to yeah do weird things and we're gonna say dollar sign X dot jpeg and then we'll say colon let X plus plus done so what are we doing here again we're starting our script or command here our one liner and we're saying create a variable called X sets 1000 as an integer let is saying it's an integer so we can add to it we're gonna say look at this HRA file find all the lines that say jpeg and your book and then we're gonna cut those out and get this list and then we look through this while read we're gonna loop through this line by line and what are we going to do what we're gonna create a variable called URL for each of those lines and we're gonna say W get download that URL so download each one of these lines we're gonna say the output will be again if we don't give it an output it's gonna output it with this full name which is not the end of the world but then you want to clean that up after that we're gonna output to dollar sign jpeg which will be 1000 dot jpeg but then it will add one to jpeg so the next time it loops around will be 1001 1002 and that was just it's an easy way to name these files keeping them order without having to put placeholders zeros I've shown how to do that before this is just a quick and easy way obviously there's lots of ways to do what we're doing right now and some might be better than others this is just in my mind the quickest way that I've come up with for doing this that I remember that I know there are commands that can clean this up a little bit but I don't use them regularly so they're not in my head and there's like 240 some pages for this and as you can see they are downloading we're at 180 190 200 so again the only reason we're able to do this is because we've acknowledged that these images don't need a login if you need to log in to view the images we would have to create login credentials and all that this is just if that's the case then I would go the other route that I showed in the previous video where we just bulk download things of one file and extract it from there so again now I can click on any one of these pages and I can look through the pages as images and if I want to if I have image magic installed which you should when I say you should it's not it's fairly commonly installed if not you should have installed it's such a useful tool we're just gonna say convert all these jpegs to this is Naples high year book I think this was 1973 PDF and with that without giving any parameters it's basically going to put all these jpegs into the PDF I don't think it's going to recompress them at all so it should be let's see we have all together let's do it up here we'll say do you dash H we can see that there's a hundred fifty seven so if you count the images and the PDF it's 157 megs and if we were to list whoops out our yearbook it should be about half that which it is so there you go and now we can xdg dash open xdg will use whatever default application you have set for PDFs to open this and there you go there's page one and we can now go through all these pages this is our PDF of our yearbook and I'm pretty sure we got most of the pages yet that goes in order there and that's it now you can save this upload it do whatever you want with it you have individual images and PDFs for your liking so so much what we did before but instead of downloading everything to one file through the browser and then extracting images out of it we basically downloaded a list of the images and then downloaded those separately two different ways to do it again this is a better way I think if they if since we don't need login credentials to actually pull the images if you try to again click on this and say open a new tab and it wouldn't or and you did this in incognito mode and it wouldn't load that means you have to log in to access that image in that case I would go the route we went in the other video again there's lots of different ways to do things depending on the scenarios I hope you found this useful please visit films by Chris dot com that's Chris the K there's a link in the description and as always I hope that you have a great day