 We already talked about spreadsheets and images. How about text? Could we extract any meaningful information from a set of documents? Of course we can. First, we need to install text add-on. Go to Options, Add-ons and select Text. Restart orange for the add-on to appear. Now, let us load the data. Place corpus widget on the canvas and open it. Go to Browse Documentation corpora and load green tails selected. We have 44 green tails on the output of the widget. What are these texts about? Connect corpus viewer to corpus. Corpus viewer displays text and enables us to browse it. For example, we can output only those documents that contain the word king. Another widget for visualizing the text is WordCloud. This widget displays word frequencies in a cloud. The more frequently the word appears in the text, the larger the word will be. But our WordCloud shows silly things such as punctuation and uninformative words. We will use preprocessed text to get rid of these. This widget will transform all texts to lowercase. Next, it will convert text into individual words and omit the punctuation. Individual words are called tokens. Finally, it will filter out stop words. The effects of preprocessing can be visually explored in the WordCloud. After preprocessing, this visualization looks much better. We retained only meaningful words and now we can better understand what our corpus is about. Green tails talk about kings, fathers and wives. But some words are still a bit annoying such as could, would and said. We can filter these out as well. Let us write our own custom stop word list. Open a plain text editor and type each word you want to filter on its own line. Then save the file and load it next to the preset stop word list. The changes are now propagated through the workflow. And the words we defined in our stop word list no longer appear in the WordCloud. Preprocessing is the first and a very important step in text mining. We defined our tokens and filtered out the bits we didn't need. Now our text is ready for the next step. In the following video, we will use preprocessed data to find interesting groups in green tails.