 Okay, this is going to try to be a quick video, just going through some web script things because different websites load things differently so you might come along some stumbling blocks if you watch previous videos. Let's go through and I'm just going to work you through my thought process here. We are at the Greater Naples Chamber of Commerce website and we have a list of businesses here and let's say we want to get all those business information into one file that's somewhat formatted nicely. So let's go ahead and open up our developers console and you can see that as I click on different sections here it loads up different things and you might think oh as I click on those it's probably loading this information so then you go through each one of these files that's loading and if you did you can look through here and I already have and I can tell you right now none of that information is in any of these files. So what I'm going to do is with the console open here I am going to restart or reload the website which does take a little while to load. Why is it taking a little while to load? Well there could be a number of reasons but one of the reasons is that they are actually loading all of those business names and address all of that information all at once when the page loads and then when you click on those numbers it's actually filtering through them and I think that might be one of the reasons that they are loading kind of slow, the page loads kind of slow. Now I could do XHR and try to find that information but I've kind of looked through here and I didn't really see what I was looking for. So if I'm trying to script this out yeah I want to do a little more research on this but since I'm just going to pull them all at once what I can do here is go back to all and make sure that everything is loaded so if you didn't have the developer console open when you loaded the page you want to open it then click refresh and this will list everything that's loaded on the page. All the pictures, all the scripts, all the HTML and what not. What I can do right now is I can right click this and I can say save all as HAR with content. So I'll do that I'll save all as HAR with content that's a HTTP archive record something like that. Basically what's going to do is it is going to create I'm just going to call it output.har it's going to create a JSON file I can cat it out right here and it's a JSON file let me go ahead and go into it with Vim that has everything that load on page every image every XML file every JSON file all loaded all the images are loaded as base 64 if you loaded up videos or audio files those would also be embedded in here in plain text but as base 64 that you can decode with the exceptions of there's probably pages out there that use encryption. So now what I need to do is find what I'm looking for so let's go ahead and take one of these so I can just look for this address maybe so I'll just go to the top of the file and I'll search for that and there it is right there and you can see that it's loaded up not like a nice JSON format or CSV or XML but it's actually loaded here as HTML meaning that the server is serving up HTML which some people like I think it's a horrible way to do things but that's the way it's being done so we're just going to have to go with it. So what I'm going to do now is I'm actually going to grep for that address again oops I'm actually grabbing a little bit more than I wanted there there we go from our output HRA and that list of things now I'll say WC-L so I'm piping that into WC-L and it tells me one that means that that's all on one line so let's go ahead and dump that into a new file I'm just going to call that file temp or something it doesn't matter so again I can go into here I can go into this temp file and I can grab a different address let me go ahead and grab or different just words from that and I will search for that and I can see that that information is in there so it appears that the server is serving up some form of HTML all on one line for all the addresses I can assume that at this point we can find out by doing more so now I'm just going to do some cleanup so we know that it's some HTML so again let's look in here and go to the top of the file you can see that the JSON part of it tells this is text but we can see that it's an HTML document type and what we can do we can use grep and cut and all that to clean that up but we can also use a text-based web browser so what I'm going to do is I'm going to cat out that file our temp file we're going to pipe that into one of our options is W3M I can do dash dump let's see if that's enough I might need to tell it that's a text HTML file yeah let's go ahead and then we'll say capital T actually can we do this okay capital T and I think just text forward slash HTML's on you to type and there we go you can see it's now started formatting it in a somewhat readable fashion we have a lot of these new line these carriage return characters let's go ahead and remove those so we will take that and we'll pipe that into we'll say said forces forward slash D in here and that means delete any lines that match we're gonna say any lines that have this but we also have to backslash the R in the end so that we know that they're not a part of special characters here there we go so we've cleaned it up a bit and you can see we can start seeing all our addresses here so there you go you have a readable format but we might want to clean up a little bit more so we'll play around with it a little bit more here so we can see here that we have you know a business name and then we have their address a few lines down and then we have members since and then we can see learn more and that shows now a new business name same information then we have learned more so what we can do is we can put everything on one line and then use this learn more line as a new line character so let me let me show you what I mean what I'm gonna do is I'm gonna take that same command we just ran them right here a little bit bigger and I'm gonna pipe that into TR and I say new line and what I'm gonna do is say dash D I'm deleting all new lines actually instead of that instead of new line delete we will say let's create let's put them as pipe characters because I'm gonna make a file that divides everything up by pipes anyway so we'll just get that going so now everything is on one line with pipes so all the business are on one line now we want to put each business on its own line so what I can do is I can pipe that into said and really I don't need to use TR I could have used said and just ran a command there as well but what I'm going to do here is I'm gonna say let's substitute all lines that say or all text that start off like this learn more I'm gonna say make that a new line character instead so now I have each business on its own line and with its information we can clean that up a little bit more some of them will say show on that not all them do I think so what I'm gonna do here is I'm gonna say okay let's find all this pipe visit site and I will add to our said command I will say substitute this oops with nothing so then that cleans that up a little bit and then I will do the same now if every line had the site visit and show on map I could have done this all as one command but I'm pretty sure some of them do and some of them don't so now I'm just gonna say said and I will remove all this show on map and now we've got a fairly clean fairly clean not perfect file that's divided by these new line characters it's not perfectly formatted but it's it's doable so now I can just take all this I can type this into an output dot CSV and then I can just XDG dash open which is XDG is going to use on a Linux system that's running Xorg it's going to use default program is set for this file type so I'm gonna open up this CSV file so it opens up like this and as you can see I now have a spreadsheet and again it's probably not perfect but for the most part I have oh but it is using comments it should have asked me what characters I wanted to divide that up by but you're getting the general idea here yeah that was let me use Libre office and do numeric so I'm going to discard that and I'm just gonna say Libre office open that up and there you go Libre office will ask me what character I want I'm gonna say instead of commas I'm gonna say other and use give it pipe symbol since that's what I used in particular open that there we go now we've got a better formatted so we have you know the business names we could have removed this web content you see some some lines had that some don't so we could definitely have cleaned it up more in the shell but for the most part most of these are formatted fairly right contact information contact person I can scroll all the way down and you can see that I have over a thousand of these business names and contact information again it still needs a little cleanup but I thought that that would just give you an idea of my thought process of going through this again you can clean it up more again one of the things about this is they're passing us HTML and I hate when weather service do this it be much better for us and them if they use something like Jason because then if they wanted to change the interface on their website they can just still use that same output from the server and just modify the way the page looks and they can have multiple different formats and use that same Jason output in different sections of their website but some people still insist on generating HTML on the server and pushing that out to you and yeah I don't like that some people do I do not anyway thanks for watching films by Chris calm that's Chris the K there's a link in the description and I hope that you have a great day