 Hey, what is going on everybody? My name is Roddy and today we're going to explore how to scrape data using Cheerio and Notepad. So this is going to be a little bit of a bigger intro, but there are a couple of things that I want to talk about, starting with what is web scraping. So web scraping is a technique used to collect content and data from the internet. Most of the data is unstructured data in an HTML format, which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications. Web scraping can be used for pretty much everything from e-commerce, data science, job boards, marketing, sales, finance, data journalism and so much more. To give you an example of apps that you can build, let's start with the most obvious one which is a news aggregator, a job search portal, specific search engine, competitor analysis tool, best price finder and you can use so much more. And the last point I want to talk about is is web scraping illegal? Well, I can't give you legal advice, but web scraping isn't illegal in itself, but the problem arise when people disregard the website's terms, apps and services and scrape data without permission. It is legal if you scrape data from a website for public consumption and then use it for your own analysis. However, it is not legal if you scrape confidential data for profit. And with that said, let's get started. Hello and welcome everybody and let's get started. To start with let's create a new folder. So I'm going to right click new folder and I'm going to call this folder F1 as formula one and then tutorial. Inside this folder is where we're going to initialize our project and make everything work. So in order to do that, we will need to have Node.js installed and to check whether we have Node.js installed, I can simply open the PowerShell or command line. So I'm going to do left shift, right click and open PowerShell window here. This is basically just going to CD to the folder, but you can use the CD command to go backwards and forwards. Just make sure that you are into your project folder. Now to check whether you have Node.js installed, you can simply do node dash V and this will give you your current version. But if you do not have Node.js installed, just go to the official website, which is Node.js.org and download the latest stable version, I guess. The process is very simple. Now that we have the PowerShell opening here, we need to initialize any project. And to do this, we can do MPM in it and then dash Y will basically skip all of the questions, which are usually the name, version, description and so on. So this basically creates the package.json file for us. And if we look inside the folder, you will see the file inside here, which is great. Now we can start installing the dependencies that we need for this tutorial. And these are Cheerio and Node fetch. So let's do MPM, I for install and then we can do Cheerio. And then we can list them one by one with space. So we need node dash fetch. And at some point soon, you won't have to install fetch anymore, as this will be bundled with Node.js. So let's press enter. And this should take a couple of seconds. As you can see, the packages are now installed. And we can actually open the project in Visual Studio Code. I'm going to close this one here. Let's reopen it. So I'm going to do code, period, enter. And this should open Visual Studio Code with the project here on the left side. Let me zoom in a little bit so you can see just like so. So we have the node modules here and we have the package.json file. And I'm using Visual Studio Code. Of course, feel free to use whatever code editor you wish. And I'm going to be toggling the Explorer here from time to time, just so it doesn't get in a way. The first thing that I want to do is open the Explorer and have a look at package.json. Inside here, you will see the dependencies that we installed. And you need to note that the version of those dependencies might change in future, which means that the code might change slightly. Just have that in mind. But usually you can just go with the problem and you probably find the solution for it. Saying this, saying this, the next thing that I want to do is allow us or project to use ES modules. Now, usually in my older tutorials, you might have seen me use Require for the packages. But today we're going to be using import, which is an ES six way of doing it. And in order to be able to do that, we need to add one more line in here. When you can add it pretty much anywhere in here, I'm going to add it under main. So let's add another line. And this is going to be called type. And then the type will be set to module like so. Make sure that you have the comma. And now we can use ES six to import the dependencies that we installed. I will show you. And if you want to use Require instead, just remove this line and use Require instead of import. That's all. So let's save this, close it. And let's create our application. To do this, I'm going to create a new file inside here. And I'm going to call it app.js, like so. That's perfect. Now, before I start importing the modules, one thing I wanted to mention is that let me close this, close this. One thing I want to mention is that the documentation for Cheerio is under cheerio.js.org. And it's really clean, very easy to understand. As you can see up here is how you install Cheerio. They give you a few examples. They give you the ES six example here, which we're going to use in a second. But also you can use the Require here if you wish to. And so on. So this is very helpful. They give you a lot of different methods. For example, here, you can select stuff from the DOM using tag name, parent notes, previous sibling, next sibling, and so on. Maybe we'll use one or two of these and they'll make a lot of sense at the end. And the next thing that I'm going to show you super quickly is note dash fetch. So this is under npmjs.com packages, note dash fetch. And this is the same. Basically, it has a lot of examples. It gives you clear instructions on how to install it, how to import it, which we're going to do in a second, and how to use it. So make sure that you have this in mind. Just have a look at the documentation. It's probably the best updated source that you can use for that kind of stuff. And yeah, and that's it. So let's start with our project. I'm going to close both of these and come back to the browser in a second. All right. Let's start by minimizing the explorer so we can focus on the code. And let's start by importing Cheerio. So I'm going to do import everything as Cheerio and then from in single code Cheerio like so. Let's close this. And then let's import fetch now. Import fetch from note dash fetch. Visual Studio code completed this for me, which is great. So now that we have both of them imported, there are a couple of ways that we can handle this. We could either create a normal asynchronous function, or we can create an iffy, which is basically automatically invoked function. I'm going to go with the first one, which is just creating a basic function. And we're going to run that function at the bottom of the application here. So what I'm going to do, let's create an asynchronous function async. And then this is function like so. And let's give it a name. I'm going to give it a name of get formula one drivers get a formula one drivers like so, open and close parentheses and open and close curly bracket like so. And then we want to be able to run this function. Now if you want to run this function, you simply grab the name and you paste it here at the bottom and open and close parentheses. Now this one we run this file, this function we run, but of course, we don't have anything in here. So potentially you could test it by doing console.log if you wish to and then put something like okay dot dot dot. Let's save this and let's try to run the application and see whether we get this console.log. So to do this, you can either use the terminal that we were using earlier, or you can use the terminal inside Visual Studio Code by clicking on terminal, then new terminal. And that should open the terminal. It's a little bit hard to see with everything open like that, but hopefully that should be fine. So now in order to be able to run this app.js file, all we need to do is tell nodejs to do that. So we can denote and put the file name app.js and press enter. As you can see, this run the app.js file and it came back with okay, which means that all asynchronous function is working because we've started it here. And that's great. All right, let's minimize this a little bit and remove all of this as we want neither. Since this is an asynchronous function, we can wrap everything into a by catch statement. So let's do that. I'm going to do try catching here. It comes up with the also fill from Visual Studio Code. So it's try curly brackets to open catch, we catch the error and curly brackets to close the catch. So if you want to grab the errors, what we can do is console.log and then we can just grab the arrows like so. That's all good. And we are now mainly going to be focusing inside the try here, inside the try statement. And what we want to focus on first is trying to fetch some data from the official formula one website, as this is what we'll be using today. So if you go to the browser, you can go to formula one.com slash en for English and slash drivers.html. And this is what we're going to be scraping today. So we need to be able to access the data from this page. And to do this, I'm going to be using note fetch to fetch the data. And we can get the source code from this page. So if you do right click view page source, we'll be able to get all this and then select some of the data. So what I'm going to do is copy this, go back to the project. And inside here, we can do we can save the fetch into a response const. So to do this, we can do const response. And this is going to be equals await, as we have an asynchronous function, we can do that. And then this is going to be fetched. And simply, all we need to do is put in single quotes here, put the URL. And that's it. That's pretty much how you fetch data. It's super easy. Obviously, if you go to the official documentation, there are a lot of options that you can use. But this is the very, very basic survey. And then if we want to, we can straight away, we can console log this and see what we get. So let's do it. Actually, let's do console log and see the response. So if I was to run the project one more time, you'll see that we're getting this strange object with a lot of data. And we don't actually want this. What we want is the response to be converted to text, we basically want the source code. And to do this, it's actually fairly simple. But what I'm going to do, I'm going to create another const with body. And this is going to be equals await. And then we're going to wait for the response. And then we want to convert this into text like so we want to get the text basically. So if I was to console log the body now, it would have kept that console log console.log and we log the body that we just converted that we just go. And let's have a look at what we get now. I'm going to rerun the project by doing up and then press enter. And as you can see, it's a little bit messy. But as you can see, we're getting some HTML. And you can see scripts, we can input in here a lot of this, a lot of links and so on. So this is the actual source code of the page here. If I was to inspect it or view the page source, hopefully it should be exactly the same. Obviously, it's a little bit hard to read. And that's why we're going to be using the inspector tool by doing right click inspector, and then inspect the element from here. Now that we have the source code, we can actually start using Cheerio to select the elements that we want, such as the position, point, name, and maybe image. To do this, we need to load this data into Cheerio and to down just so you can see. And we definitely don't need the console load now. And what we can do is load the data by doing const dollar sign equals Cheerio. And then the load. And then we want to load the body, like so job done. And now we can use the dollar sign, just like if you're familiar with jQuery is very easy to use to select elements. And now we can use the dollar sign to pretty much start selecting elements. And I'm going to show you how we can do that right now. So let's say, for example, we want to find the wrapper of this. So hopefully, if you inspect the elements, I'm going to click on the first one. And then just go up the tree, as you can see, all of the drivers are here. They are inside column 12, call six, they don't have a very specific name, but I should be okay. And if you go up the tree, we have a row, which is also not so specific. It looks like they've used bootstrap or something similar. And obviously, your row is not unique enough in order for us to select and use it. But if you go up the tree, one more, then we're now starting to get unique class names in this example. I mean, this is probably a perfect example, because the data is strictly sour, but you're not always going to have that on every single website. So we're kind of lucky here, but it's also not so perfect either. And one thing that you need to know is that if any of these class names change that we're using, then obviously, you're going to have to update your code. And that's just the way it is. So what I'm going to do is I want to select a class name that is kind of like very unique to this, to the items that we want to select. And for this, we could use listing, listing dash items dash dash wrapper. And we could also bundle it with drivers. I mean, listing dash items wrapper might be just enough. So what we can do is grab this class name and see how many of them we have. I'm going to go back. And then inside here, we can use the dollar sign and inside parentheses, we can do dot, sorry, single quotes dot as this is a class name, and we paste listing dash items dash dash wrapper. And then what we can do, get the length of this. So I mean, let's, let's put it as a, let's put it as accounts just to make easier const wrapper. And this is equals the dollar sign, and then the listing dash items wrapper. So what I'm going to do is now console log. And then we're going to console log the wrapper. And then we can do length. Do that. And save. If I was to run the project up, enter, you will see that we're getting one. But if I was to change this to row, which is a very, it's not unique enough, there'll be probably a lot of rows and columns on this website. Let's change to row. And as you can see, we're getting 43, which is not ideal. So we need to try to be as unique as possible, grab something very unique, so we can select elements easier. Anyways, let's go back to the listing items wrapper. And this was just an example that I can call it for you if you wish to mess around, call this free. And now what we need to do is first of all, let's create an empty array that is going to hold all the data first. So I'm going to collect items. So let's do const items. And this is going to be equals an empty array with the brackets here. And save. Now if we make a little bit more space in so here is where we can select the listing items wrapper and go down the tree. I'm going to grab this based in here dollar sign in single quotes listing items wrapper. And then we can do and then we want to go down the tree because we want to basically iterate through every single column. So what we can do, we can make it a little bit more unique, we can go to row and then column 12. Let's use that. So we can do the the bigger than sign and then do dot row. And then we can do the bigger sign again and then do dot call 12 to go down the tree. And we should be good to go now can now we can iterate through the columns by doing map. So let's do map. And inside map, we need parentheses and two more parentheses. And inside here, we have the index and we have the element. Now sometimes you might just see it as L short as element and I for index. So let's leave it like this. And this is going to be an arrow function like so. So let's open curly brackets and close curly brackets and make sure that you close everything like so I think that's looking good. No highlights, which is great. And now we can start selecting the data that we want. For example, let's start by selecting the rank. So if I was to inspect the page, and let's just go to this one here. And as you can see, this has a deep with listing standing, which is great. And it has rank and point. So what I want to do is I want to select the rank and by the looks where there is no other ranks inside here inside this car. So we should be good to go by using the class name of rank. But I also don't want to just select the dip. I want to select the content of this rank class name of this dip. So let me show you how we can do that. Let's grab the class name of rank and let's select it. So this will be fairly easy to do. I'm going to make some space. And this is going to be const rank equals and this is going to be equals the dollar sign, we get the element that we're looping from that we're looping through iterating. And then we want to use the find method find, which is a cheerio method, just like jQuery. And then we want to find the class name of the rank that we just selected. And we only want to select the text inside so we can do dot text using the text method like so. And sometimes if there is pacing around, you might want to do dot trim, like so I'm not going to use trim here. But sometimes you might want to use that you know when you need it. And now that we have selected the rank, we could potentially just console log it just to make sure that it's working. So console dot log and let's just do rank like so. So if we go down to the terminal, let's press up. And as you can see, we're getting all the ranks here 12345 up to 20 and then NC. And if you inspect the page here 12345. And then at the end, we have the NC, which is investing battle. Now this is how easy it is to select data. But let's have a look at a few more examples. Just so you have a few different examples, let's say. So the points here might be very similar. Let's have a look if I was to inspect the points. Now for the points, okay, we have a little bit of a problem here. For the points, they don't have a unique class name. So I could go up the tree and grab the points. And then maybe make it very specific by grabbing this class name here. So let's do that. I'm going to go back. And let's this and let's basically copy this line here by doing old shift down. And then we can do points. And then this is equals dollar sign element dot find. And then we want to find points. And then inside points, we have the class name of f1 dash white dash dash s. Hopefully this should bring the point first. Copy this and just change the console lock super quickly up. And we get the point that was super easy to do. And it's kind of unique with which it works. Well, let's have a look at what else can we get? Maybe we can get the name from here. And this could be a good one. Let's have a look. So for the name, we have listing item dash dash name that can be used. But then inside here is a little bit more tricky. We have the span for the first name. And we have span for the last name. So there are a couple of methods that we can use in here, we can either use the cheerio methods, which I think it's we can select the first element or the last element. Or we can actually use CSS to select the first element or the last. Let me show you how we can use CSS to do that. So what I'm going to do is grab this class name here. And let's go back to the editor. Let's copy this one here. And let's call this one first name, first name. So and this is going to be equals dot because it's a class listing dash item dash dash name. And then because inside there we have a span, we want to select the first one. So you can definitely do this. And there is also other methods that you can use from cheerio. I think it was dot first and dot last that you can use. But I think this is a little bit easier now. I'm going to use it this way. And then we grab the text. Okay, that's looking good. Let's console log the name. And as you can see inside here, we get all the names. Perfect. Now we can do exactly the same thing. Oops, I closed the terminal. Now we can do exactly the same thing to grab the last element here, which is the last name. So what I'm going to do is copy this line here and change it to last name. And then instead of span first, we can use some CSS magic and do last instead. And if I was to console all this, it should work. I did close the terminal in this one here. And let's go up and we get the last names as well, which is perfect. Great. Now let's have a look at something else that we can get that is slightly different. Maybe we can get the team name here, Ferrari Mercedes, Red Bull racing and so on. So for this, I'm going to hover over the team. And we can that's going to be another easy one, we can just use the listing item dash dash team. And if I grab this, let's go down and change it. So this is going to be team. And then we just need to put the class name of listing dash item dash dash team text. Okay, this was an easy one. I'm not even going to test it. Pretty sure that's going to work. Cool. The next one that I want to get is the potentially we could get the photo just to make it a little bit more interesting. So for the photo, we could inspect it here. And let's have a look at what we get. So for the photo, they've used a picture. And inside this picture, they have source and they have an image. Now we could use whatever we wish pretty much. But what I'm going to do is select this class name here. So listing dash item dash dash photo. And I'm going to use the data source from this image to grab the URL. Let me show you how we can do that. I'm going to copy this one here. And then just do photo photo. And then we put the class name here listing dash item dash dash photo. And then we want to select the image. And we want to select the data source this time. So instead of text, we're going to change the method here. And this is going to be attribute like so. And the attribute that we want to get whoops is come on is the data dash source like so let me show you one more time is this attribute here data source, which brings this URL from here. And that should give us the image. If I was to go back, we can definitely console log this just to see what we get. And if I was to go up, press enter, you will see that we're getting the images here. And if I was to control and click on one, this opens the image which is perfect. Right. I think that should be enough for this tutorial. So let me finish by pushing all this data into the empty array that we created earlier. And to be fair, you can do pretty much anything with this. You can save it to a local file to the server in this case, or you can save it in cell file, whatever you wish. Maybe we can just save it to a local file in this case, just to show you what I'm going to do. Instead of console login this, I'm going to do items dot push. And then inside here, we want to push the data and you can definitely rename everything, but I'm just going to be, I'm just going to list everything like we have. So I'm going to do rank and then comma points, comma, and then we have first name, comma, last name, comma, scroll down a little bit. What else do you have? Um, team, comma, and then later, just like so. Okay. So now that we're pushing the data inside the items array, we can definitely, I mean, we can either console login or we can save it. Let's console login first of all, and I'm console login after the map here. So after here, I'm going to do console log and I can do items like so if we run this, hopefully we should get this nice Jason object here. As you can see, we have the rank, the points, the first name, the last name, the team photo and so on. So you can use this data to do whatever you wish. Maybe we can bring the FS module from Node.js so we can save a file. And to do this, we can go up at the top of our app.js and we can just do import FS from FS. And this should allow us to write files inside or directory here. As you can see, we have only three files at the moment. And what I can do is from me instead of console login, what I'm going to do is do FS dot write file. And we want to inside here, we want to specify the name of the file. So this is going to be let's say formula one dot Jason. And then we want to Jason stringify this dot stringify like so we want to pass the items. This is the items array that we have created with push the data into and then this is going to be a function and inside and this function is going to have oops error. We can catch the error. And then in the inside the parentheses here, we open curly brackets and we close this. And inside here, if we get an error, we return console dot log the error. And if we save the file successfully, we can do console dot log. And then we can log something like formula one drivers saved us and then we put formula one dot Jason. I think that's correct. Let's close this. And now if we run the application one more time, and let's open the Explorer here, you look what happens. So I'm going to press up and enter. And as you can see formula one drivers were saved as formula one dot Jason. And if you see here, we have the formula one dot Jason file. If we click on it, this gives us a nice Jason file that we can use. So I'm going to close this here. And if I do right click format document, I think that I am using prettier to do that is an extension. But maybe it will do as default in Visual Studio code or your editor. If we click on this, you will see that this kind of pretty Pfizer for us. And we have a beautiful Jason file that we can work with. And this is how easy it is to scrape data with Cheerio and note JS. If you found this tutorial useful, please consider subscribing to my channel, like the video and give me a comment below. And that's pretty much everything for me. I'll see you in the next one. Thank you very much for watching. Bye.