 Data can be more or less complex. Imagine a data set with many features, say 100. How do you know which features matter the most? And how could you possibly project the data onto a 2D plane? One of the popular techniques to answer this question is principal component analysis, or PCA in short. PCA transforms the data into a new attribute space where features are uncorrelated and ranked by the degree of explained variance. Let me show you this by painting some data. Although this data is in 2D, we could identify the position of each point just by knowing its coordinate in a new tilted axis. This would also be our principal component. Its direction is defined by the vector PC1. If our data align in a many-dimensional space, maybe only a couple of principal components are enough to explain it. Let's see this in action. This time I'll be using wine dataset with 13 features. A 13-dimensional space is difficult to grasp, so we'll be using PCA to transform the data into fewer dimensions. How do you know how many principal components to go with? The best choice is to select the first few principal components that explain, say, 80% of variability. Orange shows the proportion of explained variance in a screen diagram. Five principal components in my dataset already explain slightly more than 80% of variability. I can check the transform dataset in the data table. Now let's see how our transform data looks like in scatterplot. I will plot the data using just the first two components. The three different vines are really nicely separated. Turns out that chemical components, called flavonoids, are those that define the first component the most, followed by phenols. Today we've learned how to transform our data into a set of linearly uncorrelated features with principal component analysis. Next time, we'll show you another way to rank features with rank widget.