 When you have a novel data set either because you created your own questions and you got your own data Or you're combining things that you don't really know if they've been put together before One of the big questions is how can you combine things or what is the emergent structure of the data that you have One of the most common ways of assessing this is with something called a principal component analysis There's also the very closely related exploratory factor analysis, which we'll cover in another video principal component analysis works on the principle of Covariance, which is closely related to correlation Right now we've got a data set here that is on the big five personality factors It's included in the data that you downloaded From data lab dot CC and it addresses extraversion and openness and conscientiousness and so on and The question is how well do they go together and what is the structure that shows up empirically in the data? We know how they're supposed to go together But what actually appears in the data that we have? Well, let's come up here to factor and click on principal component analysis And what we need to do is first feed it the variables that we want it needs Ordinal or scaled or quantitative or continuous variables. So let's come here to E1 and let's go way down here To O10 by the way, you'll notice that these all have the three little Circles that indicate that they are a nominal or categorical variable but Jomobi's smart enough to know that they're really Can be treated as scaled variables. They're from a one to five response scale I'm gonna just click to put them over here and It starts giving us the results immediately now. It might take a minute to get it all the way through And what we have here is Similar to think think about correlations There's an analogy here where we have numbers that go from negative one to positive one where The middle value of zero indicates no linear relationship And this is breaking it down into several components several Factors ways of grouping the variables Based on the data it decided that seven components seemed like the right answer now We know that based on that they're supposed to be five But this lets us know what shows up according to the settings we have in the analysis Now what's nice is right now? It's organizing the 10 extra version items together the fact that some of these are negative and some are positive That's irrelevant It's the absolute value of the coefficient that matters and you can see that neuroticism all ten are together Same thing for agreeableness and conscientiousness And things start to fall apart a little bit here with openness I can actually tell you that based on the research openness is generally the least coherent or least cohesive of the factors So this is not too surprising But we have a few options number one is if you know about principal component or Exploratory factor analysis you have the option of rotating the answer That's a complex thing But what it does is it can make the results a little easier to interpret Now verimax is one that's common by default, but I actually prefer pro max Which allows you to have what are called oblique factors where they can be correlated with each other They don't have to be all right angles in the multi-dimensional chart So i'm going to change that to pro max you can see it's saying we got pro max down here now There's a few other things i'm going to do One is I can hide the loadings you see how we have a lot of blank space here There's actually numbers on all of these but most of the numbers are really small And What jamovie does is it covers up the ones that are low now. Let me show you if I were to change it's like 0.5 Then it's going to redo the table and it's going to hide a lot of this stuff I you wouldn't usually put a number that big But it makes it easier to read the table when you hide the low values There's a few other things we can do now you see for instance it Deleted this one and on the other hand we're losing almost all the other values that are off to the other sides I'm going to put it back to the point three where it was But what i'm also going to do is i'm going to look at the Assumptions here now one important assumption in principal component analysis is something called sphericity And you can think of it as sort of an analog to normality And we just need to run through that and see how well we load Now we've got a Bartlett's test of sphericity and it lets us know that our data seemed to differ significantly from the null hypothesis And you need to take that into consideration when you are interpreting the results of your principal component analysis But let's do a few other things. Let's do a component summary That's going to give us some statistics for those components. I'm also going to ask for component correlations and get a chart That's called a screplot It's going to take a moment for those all to load up What the components statistics tell us is how much Variance in the total data set could be accounted for by each of these components now There's 50 variables. So there's basically 50 units of variance to account for And the first one accounts for more and they drop down as it goes through You can also see how the components are correlated with each other if we use what's called an orthogonal Rotation then it forces things into right angles. We wouldn't have any correlations at all And there's our uh test of sphericity This one down here at the bottom the eigenvalues those are the Uh values that correspond roughly to how much variance each of our components accounts for And it's called a screplot by the way because scre is the rubble that's on the side of a cliff And it's a little bit like what we have building up right here And we have 50 components because we have 50 variables So we have 50 potential components and the real question is how many do you want to use and there are several different rules One rule right here is to only keep components with an eigenvalue That's this thing that talks about how much variance each of these components accounts for Maybe an eigenvalue greater than one you can do that What we have right here is called a parallel analysis. That's shown by these yellow dots That's where it looks at a random structure and tries to see if we have something better than what's random There's also the what's called the elbow test where you look at where things curve And because I happen to know that there are five factors supposed to be in this I'm actually going to come here to fix factor And I'm going to hit five And when I do that we're going to get a very different factor structure that actually should correspond very closely With what's intended with the data where we have 10 variables on each of five different factors We'll wait a second for that to load up. We've lost the one reference line here So this plot's exactly the same just without the line we had before But let's come up to here And now you see that the component structure is much clearer all the way down To openness where we can see those 10 items that are supposed to go together are in fact here together Now this is probably one of the most important things you need to do when you're analyzing survey data or A number of variables that might potentially be correlated with each other A principal components analysis is going to allow you to determine the underlying structure and see what you can combine with each other To simplify the data that you have to deal with and hopefully have more reliable information at the same time You