 Okay, today we're gonna be scraping again mug shots. This is not gonna be a fully automated process But just a quick way to pull stuff I guess I really shouldn't call it scraping but that's kind of what we're doing But partially using a web browser again in the future We're gonna look at a fully automated way of this But I'm just gonna do a quick way of grabbing all the content from a web page when you're viewing it So real quick. I'm in an empty directory here called mugs. I'm at a sheriff's office website here I'm gonna say accept. I'm gonna type in a name I'll just say Smith and that will return well a lot of people in this case a hundred people named Smith who were arrested over I believe the last ten years and They're all loading here, but what we're gonna do is we're gonna hit control shift I to bring up our Developers tab here or you hit F12 if you have an F12 button and we'll make sure we're in the network tab and we have all I'm gonna go to refresh the page and Now everything that loads on the page every image every text All the scripts they all load down here once they're all loaded. I Should be able to come down here right click and say save all as HRA content and I'm gonna save it to that directory. We're working in so if I come back to my shell here I can list out now and we can see we have this file and if I open it up again It's a JSON file that That not only has the text and information He also has information on when these things are loaded into your browser Basically just all the information of everything that loaded in your browser on that page and in there You can also find things that look like a whole bunch of text like this Which is base 64, which is actually in most cases in this case an image that's Saved as plain text that you can extract it convert back into a binary image file Last time we use grep to pull that stuff out this time We're going to use a program called jq which is designed for parsing through JSON real quick Let's go in here and we go back up to the top and you can see that we have our first little entry here is Log and then a sub entry of that is called entries, which we can see is an array So we're gonna go through we're gonna say we want to look at our Our log from the log we want to look at arrays or sorry entries which are arrays and we want to go through all of those and There are different things the request is what our browser asked for we want the response so is What the web server turned back to us and then from that we want the content And then from within the content we're gonna want the text But not all the text, but let's start there So I'm gonna say jq once you have jq installed which Should be in your package manager if you're on Debian and I guess a lot of distros we're gonna say jq and Quotations were to say dot which means basically show everything and we say one dot HRA H a r There we go, and basically it's basically right now. It's catting it out But color coding stuff for us a little bit, but now I can say okay. I don't want everything I want but I want everything in the log section, which is still pretty much everything But underneath the log section. I want entries and There we go. We've got entries, but entries is an array So we're gonna put the brackets here, but from that array. We want all the response Let's spell things right response Responses so do that and we got all the responses. So we're narrowing down our information next from the responses We don't care about, you know what loaded and what time it loaded. We just want the content So we're gonna say content and that will narrow it down even more and now we can say that we want Text so we want all the little entries that say text which in many cases Art is gonna be our base 64, but we may not want all of that We just want the PNGs because there's other text and then there's there's in this particular case There's logos at the top page, which are JPEGs. We just want the PNG files So what we're gonna do here is it's instead of saying text we're gonna say little pipe symbol Select and then parentheses and this is gonna allow us to do a search We're gonna look for everything that has the MIME type so that the type of file it is and we want ones that say equal image slash PNG Okay, let's see what we do. Okay after that then we want to say pipe dot text So if I typed all that right, we're gonna be looking for I have all the entries we're gonna look be looking for the responses We're gonna look at the content of those responses and then from that we're gonna select all the ones that have the MIME type of Image PNG and then from that from the ones that have that match. We want to view their text Which should be our base 64. I Obviously typed something wrong Let me see here the only thing Error pipe select. Oh There it is. I see the problem now No period before the pipe symbol. Okay Image is not defined Okay Give me one second here. Oh, I think I know what the problem is. I have quotations inside quotations Let's change these outer quotations The single quotation marks because I have quotation marks here and so that was thinking I believe it's going It was thinking that this is ended and then it's starting new commands now Yes, there we go. So be careful with those quotation marks Another thing you do is you can backslash out so you can slash out characters But in this case, I'm just using single quotations on the outside and double quotations on the inside also known as regular quotation marks So there we go. We now have our Our base 64, but we don't want again. It's in quotation marks here We may not want we don't want the quotation marks. So We should be able to say dash dash raw Dash output and I believe that will give us the output without the quotations any tags that go along with it There we go. So now we have our base 64 going through line by line and Let's just do a test run on one So what I'll do is I'll say shuff dash and one if you have shuff on your machine not every machine will you can do Tail dash and one I'm gonna do shuff to get a random one instead Just last one maybe because I feel bad for that. Lastly. I don't want to keep showing her saying picture over and over again Hey, look, she was arrested. So that should give us a random one of these images And at this point I can pipe that into base 64 dash D for decode and we will pipe that into image magics display Which is just a viewer, but it can take the standard output and now we should get a random picture Yep, so we're doing good and unlike our grep thing this should almost pretty much always work on this file because Doesn't matter if things move around. We're actually looking at the tags and parsing through it as Json So real quick, let's do what we did in the last video and we're gonna say let X equals zero meaning we're making a variable X which is equal to zero the let is saying that it is an integer and then here instead of Shuffling and grabbing one shuffling and grabbing one. We're gonna say pipe into a while read and again I'll do M for mugshot do echo Dollar sign M into base 64 decode. We will pipe that into dollar sign X dot PNG And then we'll say let X plus plus which means every time we loop we're gonna add one to X and then we'll say that we're done give it a moment and Then I can use my package manager not my package manager But my file manager to look at this directory and there we go. We have all these mug shots So very similar to what we did last time instead of using grep We used jq which is a better way to go if you're going to be working and wanting to parse through json But the grep way again grep is pretty much everywhere where jq may not be on your system so I just thought I'd show you both ways both are good exercises to do and I hope that you enjoyed this video and I hope that you have a great day I'll try to remember to put notes in the description of this video So check out that link and I hope that you have a great day