 So far, we've been using k-means on some hand-drawn two-dimensional data. Now, I'd like to move on to some other slightly more complex datasets that we've actually used before in some of our previous videos. Now, my goal for today will be to try and explain the resulting clusters. So, first off, the zoo dataset. We can load it in using the datasets widget, then feed it directly to the k-means widget. Just as a reminder, the zoo dataset contains 100 animals described by 16 categorical features. This also means that k-means has to convert them to numeric features in order to compute the distances. Now, based on the silhouette scoring we talked about last time, k-means suggests there are five clusters in our data. So, let's take some time to investigate what these clusters represent. To do this, I'll use the box plot widget. So, let's group the data according to our clusters and sort the features. This way, we get the features that are most correlated with the clustering at the very top of our list. As it turns out, the feature most related to our clusters is the type of animal. You might notice here that type is a target feature, and orange doesn't use target features to compute distances. So, the fact that the clusters are correlated with types at all is pretty cool in and of itself. Now, one cluster here includes most of the mammals. Another one includes all the insects. And we also see all the birds are in their own cluster. There's a couple of mammals clustered together with the fish, but if we select them to see which ones they are exactly, we find the dolphin and the porpoise, which makes a lot of sense. So, going back to our box plot, we see that the clusters also correlate well with the number of legs. Most of the animals in this cluster, for example, have no legs, and most of the animals in this cluster have two. Now remember, these two are the fish and bird clusters, respectively, so again, no real surprise there. Next one up is backbone. Interestingly, all the animals in any given cluster either have a backbone or don't. I could keep doing this all day, but instead, it might be more productive to consider an entirely new data set. How about the socioeconomic data from the Human Development Index? This is another one we've worked with before, so let's just load it in to replace our existing data. Now, this time, K-means would indicate that there are only two clusters, and I can't help but wonder how do these clusters manifest on a map? Well, I'll need the geocoding and the coropleth map widgets from the geoaddon. This way, I can see the line is drawn around Sub-Saharan Africa, the Indian subcontinent, and a few Central American countries. It is interesting, but also not really surprising, that these clusters correspond to well-defined geographical regions. Now, opening Boxplot to look at the specifics of these clusters in the same way as we did before, I find that one of the main indicators of what cluster a country belongs in is its total fertility rate. So, once again, in this video, I used Boxplots and Geomaps to interpret clusters. To conclude this segment of videos, I'd like to point out that when we look at all-encompassing, wide-spanning datasets, especially socioeconomic ones, it's only natural that we find divisions everywhere we look. No matter the method we choose, be it K-means or hierarchical clustering or any other means of unsupervised machine learning or group discovery. So, it's now time to move on to our next chapter, classification.