 Hello, and welcome to this video by FilmsByChris.com. I am Chris, and today we're going to be looking at pulling some information from a website. So the other day, I get a text from a friend with this video in it, and he says, this is the project I'm working on, and it's a script he wrote to, as you can see, scroll through this page of realtors and get some of their information, looks at like 50 or 100 at a time. And I said, that's great. Can I see your code? So he sent me a link to his code, which is right here. It's 64 lines of Python. There's a few empty lines there. So we'll say 60 lines of Python code. And I said, you know, he said he wished he could do it in Bash, but it's a dynamic page. And if you watched my last video, we talked about that a little bit. So you see where we're going with this? This will be a real-life scenario. And I said, let me check it out. It's going to be that night. We got together the next day. He walks in the door. I said, I can do it in one line of Bash code. It's a long line, but not really that long. But let's go ahead and look at it. I asked him for the URL, which was this right here. And it's very, it's case-sensitive. So you have to have FL, and then Naples is the city. And, yeah, so how do we grab this information? So I'd take this as we looked at in the last video. And I just say, curl that page. It will give me out the HTML page. For example, this guy's name is David Adams. So let's just grep-i for case-sensitive. Adam, I always do that. Adam, not at all. Didn't come up because we didn't get that information because it's dynamically loading. At first, it might seem like a bad thing. But really, it's a good thing. Because it gets us, instead of having to scrape the page, we can just get JSON information. So I'm going to open up my Developers Console. Here, I hit Control-Shift-I or F12. It may be different in your browser. But let's go ahead and load that up. I go to Network. And if I click All, and I reload the page here, it's going to show me everything that loads on the page, which is a lot. There are little icons for their pictures and stuff like that. But we can narrow it down. I don't have any images. But if I do XHR, you can see, oh, we have two things here. We click on this first one. Look at the preview. It says Search Support Filters Query. So we can look through the tabs on this. And there's some JSON in there. But if we look at the next one, Search Agents Query. Click on this. Oops. Go down a couple of levels. And we can start to see in here that we have a profile information. Awesome. We can see names. We can see initials. We can see everything that we need. So all I have to do, well, I can look at this. I can look at the header here and I can scroll down and see what information we're posting, information like this. And I can figure out and create my own command. But real quick, I just can right-click that, say Copy, and I can say Copy as Curl. Let me bring that down just a little bit. There's a lot of information in there that we may or may not need. But that is everything that we need. Also, something I forgot to mention in the last video is that it is passing this user agent information. So even though I'm using Curl, the server shouldn't realize that. They should think that I'm still using whatever browser it is. You know, a Chrome browser. In this case, it's my brave browser. It's going to think it's that browser. So it doesn't even realize you're using Curl. So I'm going to type in the JSON information. I can easily pipe that. Again, this is a long command. Let me clear the screen here. If you're at the end, I'm going to pipe that in the JQ. If you don't have the install, just sudo apt install JQ space dot. Actually, if you don't do the dot, you'll get formatted JSON, but then you can't pipe it. To pipe it, you have to have the dot there. We do that and look, I have some nicely formatted JSON information. And if I was to look through that, I know I can now grep. Before we do that, let's just get this. So we don't constantly ping their website. Let's just dump the information once. But before we dump it, let's quickly look through here at the stuff we're passing. Again, there's a lot of information here, but you can see data binary. You can see it's passing JSON information as a query to the website. And right through here, if we want it, we have the state, which is Florida. And look right here. It says first 50. Let's go to that. Again, it's a long command, but you didn't have to type it out. You just copied and pasted it. And here, I'm going to change this to 5,000. And I'm going to hit that. And it seems to still work. Let's run the first command. I'm going to dump that into a file. I'll dump that into... I'll call it data1.json. And then I'll run it again with the 5,000 instead of 50. And I'll put that into data2.json. Okay, so if I list out... You see, it took a little bit longer that time. If I list out, we can see that we have two files now. I can cat out data1.json. And I can put that into jq. And we can look at this real quick and see that there are phone numbers called number. There's the city's city. If I scroll up here, I can see that there address there. We have, for their name, we have given, initials, and full. So the name is working down. So theoretically, I can come in here. I can pipe this format with jq. If you don't have jq, you can always use things like sed and awk. But I'm going to go ahead and just use jq, because it formats it nicely. And I'm going to grab for full. And we get a list of names. And I can now pipe that and say wc-l. We can see that we have 15 names. That was our first request. Let's change this and look at data2. 120. Now, we asked for 5,000, but there's only 120, apparently. Agents. That's what I'm looking for. 200-way agents in Naples. So we'll look at that more in a moment. So we got that. Now, like I said, I've got a list of their names. But if I do grep-e, full-e, number and dash-e, I think, email, I now have a list of names, numbers, and emails. And we can now format this a little bit better. And there's lots of different ways you can do this. What I'm going to do is I'm going to use tr and I'm going to say find all new line characters. And that puts everything on one line. The next thing I'm going to do is I'm now going to add in a new line character before the full. I'm going to use said now. Of course, we could do this all in one said command, but I like using tr for deleting new line characters. It's a little bit cleaner, I think. But I'm going to say find all in quotations full and I am going to replace that. Actually, let's do full colon space. Actually, then the quotations should be here. And I'm going to replace that with a new line character. Now I have each realtor on its own line with its name, number, and email address. But let's go a little bit further. We're going to format this. There's a lot of spacing in there. There's different ways you can do this. I'm just going to say said again. Actually, I'm going to use the same said command here. And I'm getting into said a little bit here. If you're not familiar with said, you know, we can, I have lots of videos on it. But I'm going to explain what I'm doing here. I'm going to say space, space, and then remove that. I'm going to remove any double spaces. There we go. We're getting a nicely formatted line here. And what I'm going to do next is I'm now going to say you said again. And I'm going to say look for and substitute anything that says number space colon with nothing. And I'm going to do the same semi colon. Look, search for and substitute anything that says email space. Email colon space. Look at what we have here. We have nicely formatted CSV. Now there are some issues because not everyone has a phone number. But I could actually dump this into a file. I'll just call it one dot CSV. And then I can say XDG dash open XDG dash open uses whatever default program you have for a certain file format. I'm going to say open that. It's going to open it up in LibreOffice or actually Genoameric in this case. But whatever your default spreadsheet application is. And you can see we did a pretty good job in just a very fascinating and I dumped it to a file and then ran that code on it. We could have done it all in one. But you can see we have 120 because there's a blank line at the top here. And you can see there's one example here where they don't have a phone number and what did I just do? A phone number so it moved over and I actually did write more code for my friend that adjusted that. It looked at that second column. If it had an at symbol then it added a blank space for a number moving that over. And we also I can remove all the dashes. Apparently double clicking on one of these boxes pastes it over the next few rows. I don't know what. I don't use Genoameric very often. Or Genoameric? I don't know. But it's that simple. And we can change this again. We got 120 here. If I was, yeah, just discard that. Let's go back to our curl command here again which we curled from the page. I had set 5000 which I have found is the max on this page which we'll talk about in a moment. But let me go here and change this to somewhere where it says Naples for the Naples. And again it is case sensitive this particular website Miami. And I can dump that into data3 If I cat that out data3 Oh, I accidentally put it back into data2. I overwrote data2. Anyway and I put that into jq. and I rep full for the name and do a word count on that. You can see that apparently in Miami they have 687 realtors. And if I was to go back in here let me change this to 3 and where we have where it says city and the value is Miami. Again don't get scared this is a long command. It's just a lot of information in there. I'm going to erase where it says city and just say give me a state of Florida. I'm going to dump that into data3 Notice how long it's taking? Cause it's pulling more information. And I was using set a lot. Really jq can be used to search through JSON information and pull out that information. I was just using set because I'm more familiar with it. But if I come in here and I change this to 3 you can see that we have 5000 and 3. There might be a few extra lines in there. Bits of information. This particular website is cutting you off from what I've looked at at 5000 returns. So even though I'm asking for the state of Florida, it's cutting me off 5000. But there's a way around that if we look at the page again. I'm going to click here to clear out what we've captured. And I'm going to scroll down a little bit. Oh, look at that. We got another query here under XHR. Look at that. And you can see the response here. And I'm going to copy that as curl and look at that command. And it is very very similar to the last one. So we have the city of Naples right here. And sorry, city of Naples here. And you can see it says first 50 which can change that number. But look, it's added this after 49. Okay. So if I wanted to do the state of Florida and grab more than 5000, I can say grab the first, you know, 5000. And I can say after 4999. And that should give me the next batch. And I'm pretty sure that my buddy, I was able to pull down around 10,000 agents and then went through and found the ones with emails. It was like 800, 7, or say 8,700 something. So I was able to get him a list of realtors information. We put in the cities there. We, you know, so we grabbed the proper cities and stuff that he wanted. But it's a lot better than what he was doing. It's a lot less code. It was a lot quicker because we're only doing and we're not hitting up their page 100 times. I just said, hey, give me 5000. Hey, give me 5000 more. That's great. Where he was going, give me 50. Give me 50 more. Give me 50 more. And it was taking forever. He was getting some errors. And it was not just pulling down this information. It was also pulling down it was loading the entire page. So it kept scrolling and loading up all those images. And if we list out here, you don't want to bog down somebody's website. But look, when I grabbed those 5000 bits of information, it was only 3 megabytes. I do that 2 or 3 times. I've only pulled down under 10 megabytes and I can get the whole state of flower from them and I'm not bogging down their site. Because you don't want to bog down someone's site. If you do a lot of queries, if they're monitoring it all, they're going to notice and they might block you. And you don't want to do that to somebody's site. But it was very simple for me to pull down this information for them and I didn't have to pull more than 10 megabytes worth of information to get him all the information he needed. And that's pulling information from a dynamic page where if it was a static page, I would have had to pull down the HTML every time, scrape through that and I would have had to do a whole bunch of requests for them because you don't want to load that much information into one web page. So it would have to be page by page loading new page each time. So again, in many cases a dynamic page is simple, if not simpler than scraping the page. You're pulling down properly formatted information and you're not bogging down their website like you would if you were actually scraping because this isn't even really scraping. You're just requesting information from the page. I'm just requesting 100 times what they originally requested, but I do it all in one swoop too if I wanted to get them all. Anyway, again, this looks like a lot, but don't worry about it because all I had to do was say, right click copy as curl and then I just changed the bit of information I wanted from 50 to 5,000 and pulled down the information and if I wanted more I added that after and I can always change the city or state if I wanted. And that is how you do that. I do thank you for watching. Please visit FilmsByChris.com that's Chris the K. There's a link in the description to my page. If you enjoyed this video, think about subscribing, liking, sharing, and if you want to support on my website you can go to support section and you can make a donation through PayPal but you can also support me on Patreon and that's fun and useful. So, yeah, think about that. There should be links to that in the description as well and I hope that you have a great day.