 During our last couple of videos, we've introduced three-dimensional reduction techniques. Principle component analysis, multi-dimensional scaling, and t-distributed stochastic embedding. Our main objective was data visualization. This means reducing the number of dimensions to two. And on smaller datasets, we've found that these techniques tend to yield similar results. But this should come as a bit of a surprise, because the three approaches really are very different in nature. PCA is a projection, meaning that for some data, and let me just quickly create some here, PCA finds its projections to the axes defined by principle components. Here I can show what could be the axis for the first principle component and what some of the projections could look like. This way, I can see that each point's value for the first principle component is the distance from the projected point to the center. These principle component axes are computed directly from the data using the eigenvectors of the covariance matrix. On the other hand, MDS considers some multi-dimensional data. Imagine the data we've just drawn, but scattered in three dimensions, not just the two you can see on your screen. Now, I'll call the distances between each pair of points, i and j, d, i, j. MDS aims to embed our data into a two-dimensional space in such a way that the resulting distances are as close to the original distances as possible. If I call the distances in the new two-dimensional space, also the embedded space, lowercase d, i, j, then what MDS tries to do is minimize the sum of capital D, i, j, minus lowercase d, i, j, square. MDS finds the embedding that minimizes this criteria function iteratively. It starts with a random placement of points and then in each iteration moves the points to slightly lower the value of the criteria function. We can take a look at how this happens on the zoo dataset. First, let me load the zoo dataset, add the MDS widget and randomize the positions of the data points, set refresh to every iteration and press start. Great, let's do it again just for fun. Randomize, refresh to set to every iteration and start. Okay, something to be aware of here is that the result of the optimization procedure MDS uses may depend on the random initialization. To make things more deterministic, MDS usually starts with the placement of points obtained by PCA instead of a completely random placement. Now onto our last algorithm, Disney. This is also an embedding that uses iterative optimization to find the best placement of its points. It's similar in execution to MDS. They just use different criteria functions. TC's criteria function is a bit more complex and prioritizes maintaining the distances between each point's neighbors. The differences between these three methods can be huge. This time, I'll use another biological dataset called bone marrow mononuclear cells. Each of the thousand rows that contains data on a single bone marrow cell. I won't go into much detail about this dataset. It's enough to know that it contains a thousand features that record the activity of the genes in each cell. Now, there are multiple types of bone marrow cells and we might expect to identify them by clustering the data. So, I'll first pass this data to PCA and look at the results in a scatter plot. The coordinates PC1 and PC2 are at the very end. There. Now, we want to compare this to what we get from Disney. This time, it really is completely different. The clustering structure is much more pronounced in the Disney visualization. Now, I can select a cluster from the Disney visualization and see where these points are in the PCA visualization. To do this, I just passed a selection from Disney to the scatter plot widget. Now, with both visualizations side by side, I can select any cluster in Disney to see how it translates to PCA. Like this, or maybe this, or this. Notice that from PCA, I would only be able to find some of the clusters visible in Disney, not all of them. Now, I can try to do the same thing again with MDS, but I find that it resembles PCA more than Disney, with much less structure immediately visible in the data. So, in summary, when I want to find clusters in my data, I use Disney. When I care about all the distances, I use MDS. And when I need some robust dimensionality reduction methods that use mathematical projection and retain as much variance as possible, I use PCA.