 In our previous videos, we used principal component analysis primarily to reduce and display our data in two dimensions. This time, we'll take a closer look at the relation between the data's features and their principal components. Let's start off by painting some two-dimensional data, like this. Next, just to use standard notation, I'll rename the two features to x1 and x2 using edit domain. Also, don't forget to press apply for the changes to take effect. Now, visualizing our data in scatterplot, notice that the first principal component should probably be along this line, and a line is just a set of points that satisfy a linear equation. In our case, I have two features, x1 and x2, and the points on the line would need to satisfy the equation theta0 plus theta1 times x1 plus theta2 times x2 equals 0. The Thetas are the parameters of the equation that defines the line. A vector that defines the projection line should have a theta1 and theta2 of opposite signs, pointing either up and left or down and right. Theta1 and theta2 are also called components of the principal axis, and report on the weight or influence each feature has on the principal component's values. Since I drew my data diagonally, I expect both features, x1 and x2, to have more or less equal weights. Also, as a side note, the Thetas of the line that defines the second component should have the same sign. Okay, now it's time I check this in orange and stop doing the analysis by hand. So, here's my PCA, and just for this demonstration, I'll switch off the normalization. The screen diagram tells me that x1 and x2 are well correlated, and that the first PCA component covers most of the variance. I'd like to double check if my predictions are correct, so I'll take a quick look at the Thetas, that is, the weights of the features for each component. For this, I'll use a data table widget and rewire the link to pass information about PCA components instead of the modified data. Each row contains information about each principal component, the variance it explains, and the feature weights. Just like I expected, for the first PCA component, x1 and x2 are approximately equally important, and have opposite signs. And the signs of the weights for the second component are also unsurprisingly the same. Now, I'll change the data slightly, so it spans along a single axis. And again, the first component explains almost all the variance in this data. But, looking at the weights, I see that the first feature, x1, with the highest weight for PC1, is the one that defines the first component. The remaining variance spans along the x2 axis. This may have been a little abstract, but we learned that the weights of the principal component axis tell us which features are relevant to each component, and to what degree. I'll now use the zoo dataset from my previous video. The data I loaded in the dataset widget contains 100 animals, described by features like having hair, or feathers, or whether they lay eggs or produce milk. One thing to note, though, is that all the features are categorical. This means that they are not numbers. For PCA, we need numbers. So the PCA widget automatically transforms each categorical feature into a numerical one. Here, I'll do this explicitly, using the Continuize widget. Just leave the settings as default and see what we get on the output. All the categorical features have now turned into numeric ones. The antelope, for example, has a 1 in the Hair Equals Yes column, meaning it does have hair, and bass, the fish, has no hair, hence the 0 in this column. I send this Continuize version of our data to PCA and find the first two dimensions explain over 50% of the variance. Again, I can observe the individual feature weights in a data table, changing the link to pass the weights of the features. There, I see a bunch of numbers in rows with no obvious patterns. Instead, I'll first transpose the data with weights using transpose and rewire the connections. I also like to order my features according to the magnitude of the weight. As I don't care if the weight is positive or negative, I'll add in another widget, the Feature Constructor. This way, I can add the feature Wait for PC1 as an absolute value of PC1, and similarly, Wait for PC2 as the absolute value of PC2. Now, let's take a look at the coefficients. I can rank the features by weights of the first principle component. As it turns out, milk is the feature that most defines this component. It's followed by eggs, but notice here that the PC1 value is negative, so it actually is the absence of laying eggs. Then, the next most important feature would be hair. Accordingly, some animals give milk, don't lay eggs, and have hair. This component could thus distinguish mammals from everything else. The second component, for example, has large weights for aquatic animals, animals that breathe, and airborne animals. This component should then nicely distinguish between fish and birds. Okay, that's it for principle component analysis. We learned that this technique is useful if we want to visualize the data in two dimensions. I also showed that we might gain additional insight if we observe what dimensions comprise a given principle component, and in this way rank the features. And we've found PCA to be generally useful for dimensionality reduction and noise removal. But I won't push in this direction just yet. Next time, I'll instead present two other techniques of obtaining two-dimensional data visualizations.