 In data analytics, one often needs to find interesting groups of data instances, may that be segmentation of customers based on their shopping habits, finding similar documents, or grouping tweets based on their content. Especially when the data abounds, we can find clusters using a method called K-means. First, let us paint some data to see if clustering really works. We will make one, two, three groups of data points. Now let's connect paint data with K-means. This widget finds clusters so the data points in the same cluster are close to each other. That is, the distance between them is small. Here we told K-means to find three clusters. Now we can observe the clustering in scatterplot. Wow, this worked fine! And the K-means really discovered clusters where we had expected them. We can interactively change the number of clusters and observe these changes in a plot. We can ask for two clusters, four, five, and so on. For our data, the choice of three clusters works best. K-means requires us to specify the number of clusters. But in orange, we can also ask it to find the right number of clusters. We can tell it to vary the number of clusters, score each clustering, and return the best score. But how do we score the clusters? With silhouette, silhouette scoring reports how well each data point, on average, fits into its designated cluster. The higher the score, the fewer data points we have where clustering membership is not clear. Let's instruct K-means to use the silhouette score and guess the best number of clusters. It's three, just as we expected. Let us add a few more clusters to our data. Four, five, six. Now this looks simply wonderful. Every time the K-means with silhouette scoring correctly guess the number of clusters. Is it even possible for K-means to make a mistake? Let's see. We will draw three clusters in the shape of a smiley face. Let's see what silhouette suggests. Four clusters? That can be right. Obviously, there should be three. This is one of the drawbacks of K-means. It works well on compact, spherical-shaped clusters, and fails on shapes of a different kind. Now let's use K-means on a real dataset. Boston housing prices, for example. We will ask for two clusters, and then observe the differences between clusters in the box plot. Looks like there are major differences between the clusters in this data. With respect to the crime rate, pollution, and age of the houses. We can even check if clusters make sense in the MDS data projection. Well, they do. This dataset indeed has two distinct clusters. Today we've learned what K-means does and how to use it on a real dataset. In the next two videos, we will explain how K-means and silhouette scoring work and how silhouette can find in liars and outliers.