 Social media is not just for fun, it can also be useful for understanding people's behavior. Especially on Twitter, with an open access to its content, it's easy to follow and analyze topics, political opinions, friendship networks, trending hashtags, brand sentiment, and so on. In this video, we'll learn how to retrieve data from Twitter, how to reprocess it, and uncover interesting topics from the corpus. Twitter provides a way to retrieve tweets, but you will have to first register and get an API key. The link to the Twitter developer's site is in the description below. Next, you will have to apply for a developer account and create an app. Once you have created an app, go to keys and tokens, then copy the key and secret key, and paste them into the orange Twitter API dialog. Now the Twitter widget is all set up. Say we wish to see what is trending in the machine learning community. I will enter hashtag machine learning into the query box and set the language to English. I will go with just 100 tweets to keep things simple, but you can retrieve as many as you like. The best way to observe text data is with corpus viewer widget. Here I have all my tweets. For a concise view, I will select contents in display features to see only the content of the tweet. Then I will use Ctrl A to select all the data. Now I can read the tweets one by one. But of course we won't do that. If we had thousands rather than 100 tweets, it would be impossible. So instead, let us look at the most frequent words to see what a corpus is about. We will connect word cloud to Twitter widget and see what we got. Oh boy, lots of useless things, our query and some punctuation. Let us remove this with preprocessing. We already have some presets here. We will keep the lowercase and add remove URLs. The word cloud had HTTPS ranked at the top. But this is not an actual word, so let us remove it. The preview in the bottom left shows the first few tokens, so I can see how my data is changing. The next step is setting the right organization. Instead of splitting by word, we will use a pre-trained tweet tokenizer, which is able to extract hashtags, mentions and emojis. The downside of this tokenizer is that it also returns punctuation, which we will remove with filtering by Ragex. The preset regular expression will remove most punctuation characters. A quick glimpse into the word cloud shows us our data now makes much more sense. The top hashtags used with machine learning, RAI, artificial intelligence, data science and deep learning. Finally, let us uncover what these tweets are about. We will use topic modeling to uncover Latin topics in the data. There are three methods for topic modeling. We will use Latin Diracleta Location, which is a generative method based on word co-occurrence. We are asking for 10 topics. In the widget, we see the defining words for each topic. An even nicer way of observing topics is in the Hitmap widget. Select clustering with optimal ordering to cluster topics by how frequently they occur. Wonderful! Now I can select a subset of documents with high topic frequency and observe them in a corpus viewer. Today we will learn how to retrieve data from Twitter, how to reprocess it, how to extract interesting topics and how to plot them. In the next video, we will learn how to perform sentiment analysis on Twitter data.