 Hi! In this video, I will show you how to find clusters of frequently occurring words in the corpus. We will use the proposals to the Slovene government that are freely available and which I have introduced in the previous videos. Again, you will find the links to our document server in the description box below. Notice that while in the previous video we have clustered the documents, in this video we will find groups of related words. As in our previous videos, we will start with proposals to the Slovenian government. From there, you copy the web address and paste it in the URL entry box of the import documents widget. Again, we preprocess the data and convert the text to lowercase, construct the list of lemmatized words and remove the stop words and numbers. We will now display the words in the word cloud. The word cloud widget weighs the words by the total number of occurrences in the corpus. For instance, the corpus we use mentions the word tax 1428 times. We will select the words that occur in our corpus over 200 times. Notice that while the word cloud outputs the documents that include any of these words, it also outputs the list of selected words. We need to rewire the link between word cloud and corpus widgets by double-clicking on the link and connecting selected words of the word cloud with the data channel of the corpus widget. By passing the list of words to a corpus, we enable several widgets that handle this data type. The corpus widget defines that the feature named words will be used as a text feature and as a title of each document on the output of the corpus widget. This packing of each word into its own corpus item is required by the widgets that follow. Now we are ready for some semantic embedding. We will use the document embedding widget that takes words that is documents from the corpus widget and then uses an already trained deep neural network to represent each word with a vector of numbers. Here they are in the data table. Once we have words represented with a vector of numbers, we can measure distances between them. We will use cosine distance. From the distance matrix, we can now construct hierarchical clustering. Use the word linkage and annotate it with words. See a small cluster with words income, tax, salary and pension or a cluster that deals with traffic and driving. We can also construct semantic word maps with the tisny dimensionality reduction widget. We will use exaggeration of 2 and initialize the tisny with 20 principal components. We will instruct the tisny to label the data points with words but only for selected points or points from the subset input channel. Here is a tisny cluster related to income and taxes. Orange also allows us to combine visualization widgets like hierarchical clustering and tisny. For example, we will feed the selected data from hierarchical clustering into a data subset channel of the tisny. Here is the group of words related to traffic and a cluster from hierarchical clustering with words related to taxes. Notice how any change in hierarchical clustering propagates through the workflow and changes the tisny display. To demonstrate this, here are also the words related to measurement of time. Orange is great for mixing different types of data visualizations and constructing workflows for visual analytics. You're welcome to experiment with this type of workflows on your own and perhaps use other widgets like multi-dimensional scaling for documents and word embedding.