Unsupervised learning is a data mining technique that generates clusters of datapoints that share some similarities. In the real world applications there is no "right" label or classification for each observation. If there was, a supervised learning (classification algorithm) would be used. However, for the purposes of getting a feel for unsupervised learning it is useful to start with a dataset where you do have a sense for how you'd like it to be classified. Thus for this exercise we'll create our dataset from scratch.
In this part 1 of 2, we'll set up a fictitious dataset consisting of 3 groups, with 100 observations in each group (300 total). The dataset will have 3 attributes. One is generated using random deviates from the Gaussian distribution, one from the binomial distribution, and one from the Chi Square distributions.
In part 2 we'll apply kmeans clustering to this dataset.
Sd is the value of the Standard Deviation and Not the standard distribution... But anyway: nice video!!!
ricckli 1 year ago