 In the previous videos, we talked about k-means and how to find a good number of clusters in our data. We mentioned silhouette, which is a score of cluster quality and helps us find the k to our means. Understanding how silhouette score works is quite simple. Here we have three clusters, green, blue and orange. Now we would like to know how well this data point belongs to the blue cluster. First, we will measure the average distance between our data point and the points in its own blue cluster. Let's call this distance A. Second, we will measure the average distance between our data point and the points in the closest green cluster. Let's call this distance B. If our data point is well grounded in its cluster, B needs to be large and A small, so that the difference between them, B minus A, is as large as possible. To normalize this score, we have to divide it by the maximum of A and B. The silhouette score for our data point will be quite high, since it lies close to the center of its cluster. A silhouette score for a point that lies in between the two clusters will be close to zero. Let me now paint some data. I'll pass it through k-means clustering and visualize the clusters in the scatter plot. I will use silhouette widget to find points that are close to the center of the red cluster. To observe where the selected data instances lie, I will connect silhouette to the scatter plot. Let me select a few top-scored data instances in the red cluster. Wow! They are indeed in the center of the cluster. And those with the lowest scores? They are the borderline data points. I can use silhouette plots on any data that include discrete class or attributes, say on an iris dataset. The biggest outliers are in the overlapping region between iris versicolor and iris virginica. And the inliers, most of them lie in the well-separated iris setosa class. Today we've learned about a mechanics of silhouette score. In the previous videos, we had used it to score the clusters. But what a great tool it makes for finding the inliers and the outliers.