 By now, we know that we can visualize our data and browse through data subsets. But what else can we do with our data instances? Perhaps put them in logical groups. We will use the good old iris data set as in our previous videos. We've already observed that flowers are different. But how do we know whether they're just one single group of flowers, like one single species, or they belong to different groups of irises? I will show you how to discover groups and possibly subgroups and sub-subgroups using a method called hierarchical clustering. So, how does this clustering work? Naturally, we would like to group the flowers together so that those with similar leaf measurements would belong to the same group. For two flowers, we can check each measurement and compute the difference between the measurements. Square it to make sure it's positive and then sum the square differences across all four measurements. At the end, we can compute the root of the sum to match the original measurement units. I've just reinvented the Euclidean distance. Still, we can see from here that the smaller the distance is, the larger the similarity. Flower distances can now be used to construct hierarchical clustering. Connect the hierarchical clustering widget with the distances widget. Hierarchical clustering displays a dendrogram, which is a tree that reveals the structure of the discovered clusters and the distance between these clusters. Let's make this dendrogram more telling and annotate the branches with the species of iris. It looks like clustering indeed made sense of the data, as flowers of the same species are clustered together. However, there's an area with some mix-up. Let's mark it. Selected flowers, I mean data instances, will be on the output of hierarchical clustering widget. To check these instances, we will send them to the data table. And voila, here they are. Looking at this data table, I'm not much smarter. Wouldn't it be cool to see these selected flowers in some visualization and say in the context of every other flower in our dataset? We've done this before and we'll do it again. We will use scatterplot widget to visualize all 150 flowers. No, not this visualization. I want the interesting one. Here it is. Now connect hierarchical clustering to the scatterplot. The mix-up is, naturally, in the bordering region between iris-virginica and iris-versicolor. We can now browse through clusters in hierarchical clustering and observe their mapping in the scatterplot, just like we did in the previous video. It helps if we have both windows open at the same time to observe the results. This is how orange became a tool for cluster exploration. There is so much more you can do with clustering in combination with other orange widgets. But for today, this is it. We've learned that hierarchical clustering requires information on distances in the input that it displays a dendrogram in the visualization and that we can select data instances in the clusters of the dendrogram to output them to other orange widgets.