 What's going on everybody, my name is Alex Friedberg and today we are back with another video. In today's video, we're gonna be scraping data off Twitter, then we're gonna be using NLP to transform that data and then do data visualizations with the data that we actually scraped off of Twitter. So I'm really excited about this one. If you're looking for some type of project for your portfolio, I think this is a really, really good one. So I'm super excited to share this one with you. Everything you see will be on my GitHub, so feel free to use it and download the code. Make it your own, make it unique, and I think it'd be a really cool project, especially for people who are just starting out. So with that being said, let's get started and let's look at the code and we're gonna walk all the way through to the very bottom, which has my visualizations at the end. But at the very beginning, I like to put all of my imports and all my libraries at the top. I don't like to download my libraries as I go. I like to package them all at the top, so that's why I have that here. But I'm using NLTK, as you can see, Panda's NumPy Regular Expression Spacey. And this spacey load is actually for this almost like a dictionary of words, and so I'll be using that a lot later. So let's go down here. So this is where I actually scrape the data from Twitter. So I'm using Twitter scraper as well as Datetime and Panda's. I specify my date ranges, my beginning and end date. And then I'm actually going to specify what user I'm actually gonna scrape the data for. And for this one, we're gonna be using Donald Trump. Love him or hate him. He tweets a lot, and so it's a lot of really good data, and so we're gonna be using him today. And then it's a little bit more of specifying what data I'm pulling from this. So I only want the text that's actually in the tweets, so I'm gonna specify that, that that's what I wanna pull in. I've already ran this, so this is what it actually looks like when it's running. And this takes a little bit of time, couple minutes at least. So for this first tweet, you can see that it says Admiral Ronnie Jackson, and then if we go over to his Twitter, we can see that that's what it actually says on his Twitter. So you can see it right there. So we're actually pulling all the text from his tweets, which is really cool. And if we scroll all the way to the bottom, we can see that there's 638 tweets that we actually pulled in. So that's a lot of text, and so we're gonna need to break that up. So I'm just gonna go line by line and split the words and actually have it look like this in the end, so it says Admiral Ronnie Jackson, but now it's split into individual words instead of all free text. So now we're gonna go down. We're actually gonna start removing all the punctuation, and that's important because we want to clean up this data. This data is very, very messy. We wanna start standardizing it and normalizing the data. So there's a lot of things that we're gonna do with that. And one of the first things we do is remove the punctuation. So all I do is use regular expression to remove all that punctuation. And then when we go down here, the next thing that we're gonna do is actually something called stemming. And stemming is basically taking a lot of different variations for one word and breaking it down to the stem or the root word. So something like run has a lot of different variations like running, runner, or ran, and we wanna break that down to the root word of run. So that is what we're doing right here. Now some of these words are gonna look a little bit different, but it does make sense in the end, and not all of them work out perfectly, but it does for the most part work as intended. So now we're gonna go down. We're gonna remove all the stop words. Stop words are just words that are really simple and super common, things like a, the, of. These are words that are gonna repeat thousands of times in this free text, and we don't want all of those because we really don't care about them. They don't really add meaning to the actual text, so we're gonna remove those. After we remove those, that's kind of our basic cleaning process. I could go a lot more in depth, but for this project I really didn't. I just wanted to keep it simple and do the bare minimum, to be honest. So now I have my basic text that I want. I'm gonna take all of that text and I'm gonna do something called value counts. And all it's gonna do is group my distinct words and then give me a count of them. And that's gonna be used for my visualizations in just a little bit. So right down here, it's just gonna give me some of the counts. Let's go down a little bit further. We're actually gonna start doing some visualizations. As you can see, I'm using Matplotlib, NumPy, and Seaborn. These are all really good visualization packages that I like using. So that's the ones that I use in this one right here. So the first thing that I wanna visualize is just what words are gonna Trump using the most. And so right here, you can see that the top words that he's using are words like great, people, fake news, and others. I just chose the top 20, so I didn't get a hundred of them. But I just chose the top 20 that he's using the most. Let's go down a little bit more because this is where it gets a little bit more interesting. I'm gonna use a library called Spacey. And what I'm gonna use that for is to actually break the words into categories. So there are categories like people, places, things, organizations, all types of different things that you can break them out into. So all I did in all of these scripts is basically break everything out by organization right here. And what we're gonna do is see what types of organizations or things as Donald Trump talking about the most in this time period. So if we scroll down here, you can see that the top organizations that he's talking about right now are the US DOT, which is the United States Department of Transportation. He's talking about Fox News, Fake News, the Republican Party, CNN, and a lot of others, as you can see. But I guarantee that this is accurate because I have looked at his tweets. I know what he's saying. And this is a lot of the things that I have seen personally. So this is just a really easy visualization of the types of organizations that he's talking about right now. And if we scroll down a little further, we're gonna do the same thing, but we're gonna do it on person. And so we're gonna see who are the top people that he's talking about right now. So let's go down here. And I will say that this one is not perfect. Sometimes it confuses just regular nouns or regular words for people. But let's look at the ones that actually came up. So one of the top ones is corrupt Joe Biden, Joe Biden, Biden, Trump statue. A lot of things that I know he's talking about and that are in the news right now. Something to point out is corrupt Joe Biden. How did I get three words when I was breaking everything out into individual words? Well, that's something that's built into spacey. It's called trigrams. And what that's gonna do is as it's going through the text, if it notices that something is related like Eiffel Tower or Donald Trump, it's gonna put those together after seeing it multiple times. And then it's gonna actually put them together and keep them together because it's saying that this is actually one entity. I thought that was super cool when I learned it, so I thought I would share it with you. But as you can see, this is the end of my project. One thing to note is I actually made this about seven months ago. And so seven months ago, I had everything working and everything was perfect. And then as I came back to make this video to show you a project that I did a long time ago, I actually found some errors. And so I actually had to do a lot of debugging because packages change and libraries change. And things aren't always as they were seven months ago, especially in Python as they're always updating. So one big issue that I had to solve was this one right here where it wasn't actually collecting the tweets from Twitter. So I'm gonna include this on my Github so that you can do the same thing that I did in order to fix that issue. I thought that was something important to note. That is all I got. Thank you guys so much for watching. I really appreciate it. If you haven't seen it already, I do have a Patreon page where I offer one-on-one coaching and behind-the-scenes content. So if you wanna support me and the channel, go and check out my Patreon page. If you like this video, be sure to like and subscribe below and I'll see you in the next video.