 Hey guys and welcome back to another Python tutorial. So this is going to be part one of the many videos that are about to come out for creating a library system using simple libraries within Python. So what we're going to essentially be doing in this tutorial is using this dummy website that someone has kindly created and scraping basically all of the books that have been hosted on them and All the details related to the books. So for example, if you look at the book We've got the image of the book the name of the book and the price and availability as well Now what we're going to be doing in today's tutorial is collecting all the data That's going to be going into our library system that way we're going to be coding in the next few tutorials So first things first You want to head over to the books dot to scrape comm website, which is basically the dummy site I was talking about and Then you can have a brief look around the site What you'll notice is that it's a pretty simple layout All the books have been laid out. You can then you can click on individual books to basically show more details But we're not going to be doing so because all we need is the image of the book. We just order a case We're going to keep the title of the book the price and then the availability of whether it's in stock or not so What we're going to do first is open up VS code and In VS code, we're going to import the libraries that we're going to be using So first off, we're going to be using the requests library, which should be default installed in Python then we want the beautiful soup library Which you will need to install using pip or conda, whichever you prefer so beautiful soup is going to help us pass the text that comes in from the Page that we scraped and then turning into a navigatable object so that we can find certain HTML elements like headers links, etc So I've loaded in the two libraries. We need We're also going to import pandas because we are going to want to save the data that we collect from scraping into like a CSV file so that we can later on use it in a library system so What we're going to do next is we have a Dictionary here and now what I like to do as common practices in My scraping tutorials is making the use of proxies So making the use of proxies not only keeps you anonymous online, but also Helps you switch between different countries and IP addresses Which will obviously prevent you from being rate limited from websites when they try to scrape data from them Now obviously for the sake of this tutorial, you may not be banned or rate limited from the Website we're scraping from but to be sure that we don't get banned or rate limited We're going to be using IP Royal well kindly sponsors today's tutorial IP Royal is a proxy service provider providing safe private and unrestricted access to online information With a pool of over 2 million plus reliable IPs IP Royal allows clients to use a proxy server as an intermediary Between their devices and the web which can which allows clients to maintain their privacy and use resources They can't access directly due to geo restrictions, etc IP Royals data center proxies can serve as a great product for businesses or users looking for premium high-speed anonymous private proxies Which usually have unlimited bandwidth and no extra charges for data scraping though I would recommend using IP Royals residential proxies as they not only allow anonymity when scraping websites Which can help avoid in getting weight limited, but also let you select the geolocation of the proxy Either through the dashboard or my making minor changes in your code Lastly IP Royal handles automatic IP rotation, which not only allows easy integration within your code But they also provide the option to use static IPs in case that you decide to keep an IP for a longer duration of time In this video IP Royal has given me access to their discount codes Which will give you a straight 30% discount on your Royal residential proxies The discount code is your hand 30 and can be used to buy their Royal residential proxies So guys please make sure to grab your discount codes and make your purchases for Royal residential proxies today from IP Royal Or you'll be missing out on a great deal Now that we've looked at the benefits of using IP Royal What we want to do is head on over to the dashboard of IP Royal Which I've already logged into As you can see I have the traffic remaining in my dashboard Here you can select all the different parameters for your proxy So I want mine to be in the United Kingdom in the London region and I want my IP to be on rotation Which means that I don't want a static IP address So I want my IP to be changed constantly So what you want from here after you've set up all of these different settings is this basic code You are our example down here You're going to want this link which starts at the HTTP and ends at the port number right here We're going to copy that And then we're going to need to paste that into our proxies dictionary that we created a second ago So let's create the speech box Now first things first we need to create an HTTP key And then next we are going to need to create an HTTPS key Now for us both HTTP and HTTPS proxies are running on the same URL So no issues there Let's run both of these parameters And now we're ready to basically scrape from our site Securely obviously with proxies So let's go back to the page that we want to scrape from And then we'll inspect a few things So let's inspect the page Let's inspect the image first because we want to obviously scrape from the image Usually these whole things are stored within a single container Which is called a div in HTML So if we manage to grab that div We should easily manage to grab all of the details inside it as well Let's see So if we look carefully When I highlight over product pod which is the class product pod Right here it's got an article type element What we'll see is that all of the elements we're looking to scrape Is inside that class called product pod Now if we go further on We see the class is obviously repeated across all of the books that we see So what we're going to need to do is copy this class name And then we're just going to head over and scrape that But before that obviously you want to copy this URL And then make a request to that URL So we'll do a response equals r.get Which means we're using the request library to get the text content of this page And we're going to provide the proxies argument because we are using proxies And we're going to use the dot text to basically extract the text from the response that is obtained Let's run that We'll take a bit of time I believe Three seconds, not bad Now when we look at this it's just a bunch of HTML text Which is essentially all the content that is shown on the page but in text format Now we're going to have to convert this into a browserable or navigatable object Which is going to be our soup Beautiful soup is going to help us do this So we create a new variable called soup And then we're going to call beautiful soup And then run response dot text As the text that needs to be converted to soup So we run that And we get rid of this dot text Because I just forgot we've already done it up here And we should have a navigatable object It already looks a lot cleaner as well because it's pretty fine So now we can look for the use the dot find method To basically find the article element Which had the class name of It had the class name of product Underscore pod So we're going to copy that And then we're going to paste it back into our class When you run this what you'll notice is that it's gathered The first class instance which has the class of product Now obviously it has all the details that we need Which are the image link which is right here It has the title of the book right here It has the price right here And then the availability right here So what we're going to want to do first is You instead of grabbing just the first element from the website We're going to want to grab all of the product elements So that we're grabbing all of the books from the page And not just the first page So instead of using find we're going to do find all What this is going to do is it will return a list Instead of a single element Because it's returned all of the product pod classes On the page, on the first page So what we need to do now is maybe store this into a variable So I'm going to call this Let's see what should we call this Let's call this book container equals Now we're going to do four book Because we need to iterate through each of those books in the list So for book and book container We're going to do book.find Now what we want to find here is obviously we want this link right here So if we look closely the link is in an htmo element called img which is image So we want to find image and in image That image tag has a class of thumbnail So we can use that as well So we'll do class equals thumbnail And then essentially what you want to do is access the src attribute of this image tag which has a class of the thumbnail So let's see if that works I'm going to put a break tag in here so that we only iterate for the first element And not go through the whole thing because that will take us some time Now, as you can see, we've got the URL for the first image Perfect But what you'll notice is that when we try to paste this URL into the into browser it's not going to come up because obviously it's broken Now what we need to do is append the first part of the URL Which is this onto the Onto the remaining part of the URL that we just scraped When we combine them both we have an image Perfect So let's do that really quickly We can do a Book image URL Is equal to that string that we just copied So up until media Uh actually get rid of media as well Up until that plus whatever we find in here Now when we print book image URL We should be able to see the exact URL and when we click on it Voila the image shows up perfectly fine Okay now for the next thing that we want to get from the book container What we need next is the book title So we have to see where we can find that So if you look at it carefully we can see it's within the H3 tag in HTML So we'll do book title equals bookcontainer.find h3 It doesn't have a class name and there's only one h3 in that Div so it should be fine to just use h3 without a class And then when we do we can just do we want to grab the title attribute So I'm going to print out also the book title There we go H3 Oh, sorry, I did book container should have been book.find Book container is the entire list So we don't want to go through the entire list We want to go through just the first instance So we run that now Here a title Let's see Oh, we don't want to grab the title Sorry, we want to grab the text from inside So we just want to do dot text apologies for the errors So when we do dot text, it will basically just grab the text that is inside that H3 element right here Perfect. So we've got the title of the book. We've got the image URL Now next thing we want to get is the price of that book So we can see that it's in a paragraph paragraph element of HTML and it's got a class of price underscore color So we do the same thing book underscore price is equal to book dot find And then we do p because it's in a paragraph element And we also have a class so we might as well mention it class equals price underscore color And then basically we want to do dot text to go to keep the text element which is this bit here Let's see what we get Book underscore price Here we are we get the book price, but we have this Unwanted character right here and the pound sign we need to get rid of as well because we're going to be storing it as a float later on So we're going to do dot replace which basically replaces an instance in the text with whatever you want So we're replacing this um pound sign and the a a Weird a with an accent with nothing. So when I run this again Voila, it only comes up with the number which is what we needed the price variable Now last but not the least we're going to be scraping the availability of the item So we're going to use the p time and the class is going to be in stock availability Let's just copy this class name. So I don't make a mistake typing it up copy And now we can basically store the availability. So I'm going to do book availability book dot find paragraph tag again And then the class is going to be what we just copied Let's print that up print book availability And let's see what we get And we see we get the whole element Well, the whole content inside the paragraph time, but we don't want that. We only want the text inside it So we do just dot text up here text Okay, cool. Now we got just a text, but we see that there's a lot of white space around the text So we use the dot strip method to get rid of any white space And nice, we have everything that we need now Now we can essentially get rid of the break statement here And if we run this it should go through all of the books that we just scraped on the first page And print out their details like so perfect. So now we've essentially built a scraper that takes the content from This page right here and then stores it in our code But we have around a thousand results to go through and we've only gone through one page so far Now there's 50 There's 50 overall pages that we can scrape Let's see it does a common pattern Um in the url bar that we can go through For these 50 pages. So if I do next It goes catalogue page 2.html or do next again And it goes catalogue page 3. So the only thing that's essentially changing is the number right here So this let's try just the one And as you can see it takes us the first page If I take this to 51, I'm guessing it will crash because it said there's only 50 pages So if I change this to 50 It should be fine because there's 50 pages So what we're going to do is we're going to copy this url right here And we know that changing that one number in the url It's going to allow us to Switch uh switch to the next page So i'm going to replace this you static url with this url right here and then I will um Move all this code into a single cell. So Copy this I mean cut that And then paste it in here Same container as the loop essentially Paste that in here as well Get rid of this Now what we want to do is write a loop. So we know there's 50 pages So we want to write a loop which goes through 50 numbers essentially so for i in range One comma 51 the reason why we're doing 51 is because we're not going to start from zero We're starting from one so we need to account for one extra number and then If we just print out If we just print out i here and comment out the rest of the code It'll print out numbers from one through to 50, which is what we essentially need Now that we have that working we can push this code one and then forward And change the static number 50 So we're going to change this string to an f string And we change the static number 50 to the variable i which is basically the number which goes from one or the way to 50 Which will switch through all of our different pages Now we've got the pages working as well There's only one bit missing which is storing all of our data So i'm going to create a variable called data and then result that into an empty list Now instead of printing this data, we're going to be writing appending it to our list called data So we'll append And what we're going to be appending is a dictionary Which will have different keys and values. So book Image URL is going to be set to book image URL Then we move on to the next thing book title is going to be set to book title Book price is set to book price Lastly book availability will be set to book availability Cool. So we've got four things that we'd script one two three four And we've got all of those four things that we're saving inside our List called data So now that we're done with that basically just going to print a Let's just print up here as soon as we script one page We will Write in a message Which will say print Scraped I so what the current pages out of 50 pages For the sake of this tutorial, I'm only going to scrape Let's say one page worth of data But which is why I'm putting the break statement right here If you guys want all of the data, which you should get all of the data Do not put this break statement here. Let the code just run through all of the pages So let me run the code and hopefully We should have one page worth of data in our data list. Let's look once it's done Okay, it says script one out of 50 pages. Let's look at the data Beautiful. It's got one page worth of data. So the first book is a light in the And the last book is it's only the Himalayas. Let's see This is from page one Yep, it's correct perfect So last the last thing we want to do in this tutorial is convert this data List into a pandas data frame Which is nothing but a formatted table that you can then save to a csv file onto your computers So we want to put pandas as pd here Run that so again And then we're going to do pd dot Data frame which will convert our data list into a data frame And then we can view the data frame and as I told you it looks like a nicely formatted table What we can do next is just use the dot to csv method and then save it as um Let's say library data base csv And we set the index to false because we obviously don't want this column to be saved It's just a useless column. We're not going to need it for that. I mean we usually you could use it, but we're not going to be using it Now um once we run that it should create a new file right here as you can see And it's a csv file that can be opened in excel as well, but obviously i'm not going to um But yeah, that's basically it for today's tutorial guys If you find this tutorial interesting and would like me to resume it into creating the full fledged library system Please let me know in the comment section below if you'd like to request any other types of videos as well Please let me know and i'll see you beautiful faces in the next tutorial. Peace