 guys and welcome back to another Python tutorial. So in today's video we're going to be looking at scraping a Spotify a Shopify website using Python. I don't know why I said Spotify but oh well. So what we're going to be doing first of all is we're going to be opening up command prompt to make sure that we have all the required modules installed. So what we're going to need first off is we're going to need a module called pandas which is going to help us nicely take our data that we scraped from the website and then put it into a nice table and then save it to a CSV. So we're going to do pip install pandas. Once you're done with that you want to do pip install requests. The quest is going to let us make get requests and grab data from the sites we want to scrape and then lastly you want to do pip install bs4. bs4 just stands for beautiful soup4 and it basically lets you convert a code from a website which comes in as a string into valid HTML that can be served through. So once you've got the three packages installed which are beautiful soup4 requests and pandas we are good to start coding. So the website we're going to be scraping today is one of the top 10 websites that shows up on Spotify as the sort of inspiring list and since most websites created on Shopify not Spotify are e-commerce based we won't be going to be doing is we're going to be scraping the items from the websites and their prices and then saving it as a CSV file. So to begin with what I'm going to do is grab the URL right here which is to a website that was created in Shopify called partakefoods and I've got it open right here already but I'm going to reload it and what you'll notice is the website has a nice little layout and it's about cookies and stuff. So what we essentially want to do is go on to the page that has all the items you want to scrape and I've done some research beforehand and find out that this is the URL and this is the page that has all the items that they sell. Now what we want to do in here is essentially scrape the title of the item and scrape the price of that item and then essentially later on once we've downloaded all that data or scraped all that data we want to basically save that to a CSV. Now the most important thing right here to do is understand how the website was created or what the layout of the website is. So for that we're going to use the developer tools or the inspect so what you want to do is you want to basically select one of oops you want to basically select just one of these little items because those are the items we want to scrape. This whole thing is the container that we want to scrape and inside that is the title and the price so we only want to keep that data. So I'm going to click on inspect right here and then what you notice is when you hover on the different elements it will show you what you're hovering over on the screen as well. So we want to look for the entire sort of div. Div is a container but that's what they call it in HTML it's a div and essentially we want to look for what the class of that div is. So class is like a reference or like giving a div a variable name sort of like that like a reference. So we only want to grab divs that have the class C-O-L-S-M-6 blah blah blah. Now I'm going to copy this right here the class name that I have and I'm going to do a control F to actually look for all the divs on the page with that class name. Now if I run this we'll notice that it help us over essentially all the items that we want to scrape on this website. So that's the div that we want to scrape because it basically covers all the items on the page. All the divs essentially share the same class name. So we want to scrape that div which is why what we're going to do is copy this link to the other class name right here but before we copy that what you want to do is we want to actually grab the code and make a get request to this website to grab the HTML code of it. So we're going to copy the URL partecfoods.comcollectionsall and go on to our Jupiter notebook and then first off we're going to do all our imports. So we're going to do import pandas as pd we're going to do import requests as r oops and then we're going to do import actually we're going to do from bs4 which is beautiful soup for import beautiful soup with capital B and a capital S. Now that we're done with the imports what we're going to do is create a new variable called request what we're going to call it request one and that's going to be equal to r which stands for the request module that we just imported and we're going to use the dot gets method to basically get the HTML code as a string format from the website partecfoods.com collections all because that will return the webpage code for this page right here which has all the items and prices we're looking for. So I'm going to go ahead and run that and if I print my variable request one what we'll notice is it says response 200 which means response okay so the website returned the data we wanted to and the get request that we made perfect so if I do request one dot text we'll also be able to see all the all the html code for the website we just got in a text slash string format so if I run that you'll see a bunch of gibberish but it's because it's in a string format but you'll see things like follow partecfoods on Pinterest so it's actually the content from this webpage right here because we did a get request for that URL. Now we can't really physically go ahead and look for the prices and the title for each product individually so what we're going to do is we're going to use beautiful soup to convert this text into a navigatable sort of html code that we can navigate through and find specific tags with specific class names or or IDs so to do that we're going to create new variable called soup and then we're going to equals that to beautiful soup which is the class we imported and then we are going to pass in the request which has all the code that we need as a string format so request one dot text and then comma html dot parser is the parser we're going to use to basically convert this string into navigatable sort of html now if we run this soup variable right here will allow us now to use methods such as find and find all so find all and find basically let us find elements within the html code based on the type of tag they are and the class name if you will so let's go ahead and look for the class name and the tag that we're actually looking for so first off what we want to do is we know that the tag we're looking for is div because the div contains all the information that we need and we also know what the class name for that div is because it's shared by all those divs so we're going to copy the class name and we're going to remember just let's just see again just copy this class name right here and we're going to remember that it's a tag type div okay so let's go back to our code and we're going to do soup dot find we're going to do find all because we want to find all the divs on the page with that class name not just one because find just returns the first item on that page with that with that tag and that class name but we want to actually find all so we're going to do div because we want to find all the divs on that page with the class name so class underscore equals two and you want to paste the class name you copy it now by default it comes in with the space for some reason so you want to get rid of that and then what you're going to notice now is if I run this right now it basically returns a list of all the oops a list of all the divs that we have extracted from our webpage so the first div for example it seems kind of cut off maybe it's because i'm printing a lot of stuff at the same time so let's just assign this to a variable which we're going to be calling products so since it's a list of all the divs that contain the product information let's call this product run this again and of course we won't see anything because we're not printing now if i do products since it's a list we can use the reference or the index to only print the first item so let's only print the first item from the product's list and that will be the first div that we scrape so that's the first div that's showing up right now and what we're looking for right now is the title of the item now if you look closely you'll notice that the title of the item is right here says crunchy variety pack five boxes and then if you notice the next line is actually the price now what we want to do is we want to find out what the tag is where the where the title is located and whether it has a class or not so now i've noticed that the the the tag that's being used is h4 and within it they also have another anchor tag but we don't want the anchor tag we will use the h4 tag because that's where our title is so we can do a dot find because there's only one h4 in here and we only know we know that there's only really going to be one title that we want to scrape we don't want to scrape multiple titles from one div from one div we want only one title so we're going to do product zero dot find and then as i said we're going to find the h4 run that and you notice that it only returns the h4 from the product zero so it's only looking at the first div that we scraped in this product list so in here we notice that it's also got the anchor tag which we don't want we only essentially want the readable text which is the title which is why we can use the dot text attributes and then run again to only get the text or human readable text as a string so now that we've done this once we can actually repeat this process in a loop so let's quickly go ahead and write a loop to do just that so we'll do four product in products which means product is going to be referring to each item of the product list which is each div we're going to do a new variable called product name and that's going to be equals to product which is a div in each individual div from the list dot find h4 dot oops dot text so essentially we've done the same thing that we did down here which we just did it once but now we're going to do it with all the all the divs so if i print product name for each product we should essentially get uh voila we essentially get all the titles that we scraped from the list so all the product titles from the divs that we had in the product list so all the titles are here perfect now we also want to store the price which is why we need to find out a way to find the price so the same sort of methodology again we look at the first item in the list and then let's look for the price so let's see we found it earlier on so it's right here below the title and we noticed that it's inside a span tag which also has a class so we can use exactly just that to do a find so remember that the tag we're looking for is a span and the class it has is called regular price so let's copy the class name and we'll create a new variable but before that let's just experiment here we'll do dot find so in the first item we're going to dot find a span because that's the tag and then we know that the class is actually equal to regular price run that and we get exactly the span we're looking for but we also get the html code we don't want that so we're going to do dot text to only keep the text out of it and perfect we get the text right here you notice that we have a dollar sign in there which we don't really want we only want the raw number so we'll do something to get rid of the dollar sign in a moment as well so we're going to do a new variable called product price and we're assigned we're going to assign that to each product in the product list dot find span where the class is going to be equal to regular plot price so basically we're just copying and pasting what we did for one product to essentially run for all the products and we're also assigning it to a variable called product price now as i said before it has a dollar sign in it so we want to get rid of that dollar sign so i'm going to do product price equals product price dot replace now what do we want to replace we want to replace a string dollar sign and we want to replace that with essentially nothing so now if i go ahead and print my product uh prices i should get them without the dollar sign so let's run this and we get it without the dollar sign but what we notice is some of the prices have a from text inside them we don't really want that so we are also going to replace the from text with nothing so up here where we did the first replace we're going to go ahead and do dot replace from with the capital M and then replace that with nothing we also notice that it might have an additional space leading or trailing so we're going to do dot strip to get rid of any leading or trailing spaces so essentially now when i run this you will notice that it's overwritten the initial product price got rid of the dollar sign from and any additional spaces and it's only printing the float version of the normal numbers for the price which is perfect now what we are going to do next is essentially save the product name and product price for each product inside a list um and we're going to save it in a dictionary format so let's create the list where it's going to be saved so we're going to do product underscore list is equal to an empty list the reason why we're not doing it inside the loop is because each time the loop runs it will make it an empty list and we'll end up with just one product so that's why we initialize it outside the loop and then we're going to write the product price and the product name onto this product list so we're going to do product underscore list dot append now dot append will basically just add the item onto the list at the end so we're going to do product list dot append and as a dictionary we're going to do name as a key and product name as a value comma to add another key value so i'm going to do price as a key and product price as a value now if you guys are not familiar with how dictionaries work or need a quick revision i'll link a tutorial in the description so make sure to go watch that and then you can come back to this but once we're done with uh basically appending everything to the list what i'm going to do here outside the loop is i'm going to print let's run this first nothing will show up because i'm not printing anything we'll get rid of this and i'm going to print product list just to show you what it looks like and here we are so the product list has dictionaries inside it each dictionary has a key and price key and each value for those keys is different because obviously we are going through each item and saving the price and uh name for each item separately so we essentially have what we need right here now the last last step to this would be taking this list with dictionaries converting it into a nice pandas table or pandas data frame as they call it and then saving it as a csv file so let's do that create a new variable called df which stands for data frame and then we'll assign that to pd which is pandas the module we imported as pd import pandas as pd up here dot data frame because that's the method we're going to use make sure that d and f is capital otherwise you're going to have errors the first argument or the only argument we need in here is the list which has all the data that wants to be that we want to convert to a table or a pandas data frame so of course we know it's called a products list and now what i'm going to do is run this real quick and let's print out df to see what it looks like and voila as i said it's going to turn out as the data frame is nothing more than basically a nice looking table with columns and rows where the data is organized nicely right so we can see that the row zero is crunchy variety pack five boxes for the price of 24.99 cool so now that we have everything nicely inside a data frame maybe we may want to save this onto a hard drive as a file right maybe like a csv file so we can reference it in the future so pandas allows us to also do that but because when we convert something to a data frame we have certain functionalities that are available to us so we can do df.2 underscore csv and that will save whatever is in the data frame as a csv so we need to give it a name let's say tutorial scrape.csv make sure you add the .csv extension in the end otherwise the file format won't be recognized by a computer and then you do .index is equal to force because otherwise what will end up happening is you will also have a new column in your csv files with all the index numbers which is a bit annoying so i'm going to run this cell that saves the file and then i'm going to look on my computer that's for a file that says tutorial scrapes.csv and it's actually showed up on the right time so let's open it up give it a second and voila as you see we have a column called name column called price name has literally all the items on the website and price has all the prices to those items linked to it as well so let's close this file i hope you guys have enjoyed today's tutorial um as promised last time i'm trying to push out at least one video per month so i hope this tutorial has been useful you can use the same methodology to scrape sort of other stuff from other websites because most of them have similar structures and you'll be using this technique quite often if you're trying to scrape stuff from websites if you guys would like to see more stuff related to scraping or using pandas please leave that in the comment section other than that other than that please make sure to like comment subscribe share because sharing really helps the channel out and guys i'll see you beautiful faces in the next tutorial peace