 Right, so hey guys and welcome back to another python tutorial. So I've been delaying this tutorial for quite a long time now But here we are finally we're going to today learn how to use Python and web scrape using it So in this tutorial what we're going to be doing is grabbing a website and then scraping all of the content on it So what does web scraping really mean? So I'm pretty simply explained web scraping is when you grab the content of a website and then you filter out only the Content that you need or you filter out all the content that you need. So for example, if I open up weather.com here Let's say I wanted to make a weather app. I would need a web scraper for this so that I can grab information from a third-party website So what I would ask my web scraper to do is access this URL right here and then pass the whole page and Then grab this element right here Which gives me the temperature and then what I would do is use that value in my program to grab the real-time temperature each time My program is opened. Cool. So we're not going to be creating a weather app We're probably going to be creating that down the line in a few tutorials But today we're going to be going through how to web scrape using Python So that's enough of an introduction. So now what we want to do is go ahead and install a few modules that we're Going to be needing for this Program to work properly. So first of all type in pip install Requests because we're going to be needing to make requests to a website so that we can grab the page Mine already says satisfied because I've got installed and then to manage the pages and pretty much pass the information from them and grab Information or filter information. We're going to be needing something called Beautiful soup and then type in full. So this module right here pretty much gives us a nice interface to deal with passing off Information and filtering as well. So this might not make sense now But stick stick on with me so that we can understand what I mean further in the tutorial so what I'm going to do is zoom in a bit and create a new file save it as web scraping Pi pi because it's a Python file Cool. So first of all, we want to import B from BS4, which is beautiful soup for you We want to import beautiful soup because that's what we're going to be using to filter our information and then we want to import requests Cool. So request is going to allow us to grab the page. So first of all, let's create a variable right here We call that variable URL. So now we're going to be typing in the URL of the website We want to pass or pretty much want to scrape the information from so I'm going to go ahead and grab my portfolio website Which is already. I know HDTP forward slash fine Azuz And then forward slash again Now what we want to do next is create a new variable called page Which is pretty much going to store the contents of the page that is being grabbed using this get request up here page equals Requests and then we use the get method to get the contents of this page and then we pass in the URL in here Cool. So now if we print this out, obviously it's going to return a pretty big Object to us. So we're not going to do that. So instead what we're going to do Is we're going to start a new variable called soup now each time you want to use beautiful soup This is basically how it is in the documentation. You want to create a new variable called soup assign that to beautiful soup and then what you want to pass in is the page but when you pass in the page you need to access the content of the page so do pay dot content and then you want to do a comma and then you want to make sure that you're typing in HTML dot parser So we need to provide a parser to pass this information Now since the content of pretty much all websites is HTML, we're going to be using an HTML parser So that does that Now what we're going to be doing next is I'm going to print out the content of my website So if you want to print it nicely in a formatted manner, there's something called a Pritify Method inside the soup object. So since soup holds all the content to our website, which is already passed using the HTML parser We're going to use soup dot Pritify to basically format all these information that it's being There's being processed and then print it out. So before I do so I'm going to go ahead and actually open my website in a Chrome window so that I can show you what it looks like so Chrome and then find Jesus giving a dot com. So this is what my website looks like at the current moment So I can scroll through and this is all the content on it So when I run my Python script now, it should pretty much return an HTML version of this whole website Hopefully so when I run this now Give it a second and as you see right here in my window, let's scroll all the way up It's a lot of information right here But if I show you it shows me my entire Website in an HTML format and it's pretty much show showing me in a formatted manner because I've used use the printer for Prittify Can't really speak Prittify method on it. So as you see the title is fine. Here's who's good enough calm I've got my different fonts imports. I've got all the information on it I've got the drop down comment that I created and then if I keep scrolling. I've got the footer I've got the copyright For 2020 if I scroll to the bottom, I can prove that because it's literally right here So, yeah, it does pass all the information now, you might say hey, that's a bit pointless Because we might want to just grab specific information from my website such as the weather So that it's completely possible, which is what we're going to be learning So just remember that the soup object right here that we have holds the entire website Passed as an HTML document using the HTML parser So we're going to be using that object to Manipulate and filter it to grab the specific information that we want So first off what we're going to be learning is extracting extracting all info from a class So I'm going to go ahead onto my Website and then I'm going to inspect so we're going to be needing to inspect the website that you want to pass Because you need to know what the class names and everything are if you want to pretty much just Grab items specifically So what is a class usually a class is used to define a group of items? So in a class I could have stored the moon image the text right here and the button right here So a class could be used to store a set amount or a specific amount of items inside it It could be anything from images text videos anything really any HTML type of element So class is used to just have a collection of element. That's what you want to remember So right here if I click on this class right here, or which is a div It says home container and it pretty much contains my entire home page So as you see it highlighted the whole thing So what I'm going to do right now is as a class I'm going to go ahead and find my hobbies and experience section, which is going to be a Let's see Where have I got it? So it should be in the about me if I'm not wrong Okay, so I'm opening up the about me section and as you can see I have a lot of classes because I've grouped my elements into classes or divs and Then what I've got right here is a div. Let me scroll down What I've got right here is a div and if I click on that div It shows me hobbies and experience section, which is this section right here So when I highlight over it it pretty much selects all of this because all of this is grouped and it's stored as a div element Where the class is equal to hobbies and experience? So now what I'm going to do is I'm going to use my soup object and filter the information to pretty much just show me this div Which has a class name of hobbies and experience So you can do this with other classes as well, but I'm going to show you this one for as an example So let's go to my code. Let's remember my class name, which is called a hobbies and experience And let's remember that it's a div. Cool So now what I'm going to do is go in and type in a new variable called results and then assign that to soup Which is the entire website dot find all we're using find all when we want to find multiple things If you want to find just a specific thing, we're just going to be using find now as I know in my div I am going to have multiple things because it's a class type as well So I'm going to type in div because I know that the type of element I'm looking for is a div And then I need to type in class underscore Because I know that I'm looking for a class and the name of my class. I'm also sure that these hobbies and Experience because that's the specific class that I'm truck targeting right here Now we can't just simply go ahead and just do print results because it's going to be Stored as an array of items or like an object that's being returned So we need to go for it using a loop or a full so we do for results and results We need to do print results Now when I run this you'll be able to see that Let's go ahead and show you you'll be able to see that. I'm not seeing the entire website now I'm only seeing my div class called hobbies and experience So it's only showing me everything that's in hobbies and experience. So as you see right here So it starts off from hobbies and experience and it ends at something that says Regal College International Let me verify that by going back to my website So as you see right here, it starts by seeing hobbies and experience and it ends at Regal College International So it's pretty much working fine. Now you might argue saying hey, I just want to grab the text from it I don't want all this code that's in there. I just want the value So for example in this H2 element, I just want to grab the hobbies and experience and in the list element I just want to grab the text so that is very possible by just using the dot text method So what we want to do is print instead of just result we want to print result text So when we do dot text, it will grab the text from the element and not show you any of the HTML code So if I maximize this as you see right here, we have just a text version of the website Which is displaying perfectly fine. So it starts from hobbies and experience and adds ends at Regal College International Cool. So that's how you extract information from a class and then display only the text Now what we're going to be learning next is how we can extract specific information using an ID So instead of targeting an entire Class we can just target an element using an ID now IDs. I'm going to comment this out Now IDs in HTML are pretty much used just for when you want to uniquely name an element. So that's what we're going to be doing so extracting info using an ID So now what I'm going to do is go ahead and open up my site And I'm going to inspect again because I need to do that and then I'm going to go find an ID That's called project. So as you see right here, okay Scroll down a bit in my website right here. This is a div called projects now It also has a class, but it also has it has a class, but it also has an ID now I'm going to use the ID to point to this div right here and then we'll see what the results look like So you can either point to a class or I mean you can either point to a div using a class or an ID based on What's available on the website? So just now we've went through how to access a div using the class now We're going to go through how to access a div using the ID So the ID is projects the type of element is div That's all we need to remember and now what I'm going to do is Happen results again, and then we're going to do soup dot and this time only going to look for one element in there So I'm going to type in find so soup dot find and then in there We want to pass in div because the element type is div where the actually we don't need to pass in anything in here When we're passing in an ID automatically does that for us. So we just need to pass in the ID which is equal to projects Cool, and then lastly what we want to do is we're just going to print out results Dots text we don't really need to be in the loop because we're only using find we're not using find all That's fine. Let's run this now to see if it worked So as you see right here what has happened so it says It says projects discover some of my projects What is this thing blah blah blah blah and it shows literally everything that's on my page So as you see right here using an ID may be a lot more convenient than using a class because you don't really need to loop Over all the information and you can have all the information at your convenience So it's pretty much the same thing we did with a class, but this time with an ID So we have literally grabbed the entire Div for my project section and we're showing the information right here and that to just a text Cool. That's how to get specific information using an ID Now we're going to comment this out as well So that we it doesn't interrupt with the other stuff we're about to do and now what we're going to be learning next is How to extract info using an element type and a string now? This doesn't apply very well to my website But let's say you were job hunting and you wanted to create an app your custom app that job hunts for you so you want to look on the website and find out the different job postings and but you only want results from job postings that suit your keywords So you have a specific amount of keywords like I don't know software Dev Junior software there or I don't know web dev So these are your keywords and you only want results that have these keywords in them So that's what we're going to be learning to do now We're going to be learning how to extract results based on keywords that are in the website cool, so extracting info using element type and String so in this when you when you're doing these type of things you need to make sure that you know the type of element that you're going for and this usually only works with Text elements like headers or paragraphs or or anything like that just text elements Cool. So what we want to do is type in results again And then we're going to assign that to suit dot find and then what we're going to be doing is Actually, we're going to use find all because we're going to be finding multiple things in here And what I'm going to be doing is I'm going to be targeting my h1 element right here And then next I'm going to use a comma and then I'm going to specify my string now We can't just specify the strings in here like I don't know Software there we can't just do that because basically what it's going to do is when it's matching it with the website It's not going to consider all the white space or anything. It's going to look a word for word So instead of passing just a normal string. We're going to have to pass a function through it So we're going to pass a lambda a function through it using a lambda So lambda and then we type in text which is going to be equal to So text is pretty much just a variable we're using so So this is the key what I'm going to be using who am now what should be returned is This right here. So the bit that says who am I because this bit right here includes who am so it should return This element right here cool So what I'm going to be doing is I'm going to be giving it the text of who am and then what's next is that okay I'm just going to put this one back. What's next is that we need to assign a function to it as well So we're going to do For this who am text in the text that is returned by the website Dot lower so that it's lower case So if these two keywords are in the text that's returned In this find all right here, then what we want to do is We're pretty much going to be running the loop so let's do our loop right here for results in Results and then we're going to print Results dot text now the result is only going to be that all the results in here Are only going to be key are only going to be strings that include who am So that's you that's in my case only one string because it says who am I? Cool, so it seems to be a problem here. What's the problem in valid string? Let's see string equals Lambda and then I've spelled it wrong lambda Cool spelled at Lambo. Okay. Let's run this now And as you see, okay, let's go back to our put They never mind on this again And as you see it says who am I so it's pretty much returning the element that it feels like it matches The next bit is going to be grabbing different attributes so grabbing attributes of elements using the element against The attributes cool So you might say what attributes so let's say you have images on a website or hyperlinks on a website And you want to grab the hyperlink so usually in HTML or hyperlink is written in an anchor tag Which is an a tag and then it has an hatred and then you have the link in this or something like Http And blah blah blah you get the idea So when we scrape to our website, we want to access not the a But we want to access access the information that's stored inside the href so that we have the link So I'm going to show you how to do that now So first off what do you want to do is as always we're going to do results equals suit.find all because we're going to be looking through the entire div right here Now I'm going to go through my div and then my class is going to be project footer So let me show you on my website where there is so if I go to projects and project footer Cool, so it is literally right here. So project footer is right here. So it's this entire bit right here So there is a hyperlink in there behind the button called visit GitHub And there's also an image in there, which is right here. So we're going to try and grab the attributes for my Hyperlink first to find the link of that button So we have our class which is project footer now we do four results in results we want to Create a new variable called hyperlink and then we're going to use the result and then use the find method on it and In the find method, we're going to specify what element we're looking for Let's look for the a tag first or the anchor tag to look for the hyperlink And then as I said the attribute we're looking for is the href href and Then what I'm going to do lastly is print F so an F string Following links and then we're going to do Go back in here hyperlink and Then a blank line. Let's run this up and hopefully it should just work fine Okay Let me close this off and let's try to run that up again They don't return anything for some reason. So am I in the right thing? So I've got Dave here Then I've got class equal to Project okay, I've got I've miss built this so it's Project and not on this underscore. It's a project hyphen. So let's do that and then run it now and Then as you see it says found the following links in the section that I provided and it shows my github right here So Johan, good enough 14. So if I go back to my website, let's verify that the link is correct And if I do inspect as you see right here, we have an a tag with an href of GitHub.com Johan good enough 14. So it's working completely fine It's returned the link for this section now What if we wanted to find all the image links for this section right here all we need to do is go back to BS code Let's close this off and Then what we want to do is instead of the a tag We know we're looking for an image tag this time and then instead of the href We want to look for the source tag because images use source not a href. So let's run this up and As you see right here, it says following links will found and this is on my server We have an images folder and then we have the github.png file. So as you see it shows us the link So that was it for today's tutorial guys It ran a bit long because I wanted to try and explain how this whole thing works under the hood If you guys enjoy, please make sure to drop a like comment subscribe and share because sharing really helps If you'd like to donate to the channel directly you can do so by either becoming a Patron using the patreon link in the description or by purchasing a super chat emoji or highlighted message when this video premieres Once again guys, I appreciate all the support that you guys have been showing me on the recent videos I'm really really grateful for all of that. I really appreciate it If you'd like to you can follow my socials or join the discord channel for some fun and share some ideas in there And guys I will see your beautiful faces with an interesting project in the next tutorial Peace