 Let's take a look at an empirical example of exploratory factor analysis. To do that, we need some data. Our data comes from a research paper by Mesquit and Lazarini from 2008. This is an interesting paper because the authors present the full correlation matrix of all the indicators in the paper. That means that we can replicate everything that authors do using the correlation matrix and we also get the same result for all the analysis. So this is completely transparent paper that we can replicate ourselves. This article uses converter factor analysis and structural regression models, but we can equally well do an exploratory factor analysis to see if we get the same result as the authors did. So this is the data set that we have. And it's the table one descriptive correlation except instead of on a scale level, it is on the indicator level. We will be using all questions that are measured on the one to seven scale to eliminate any scaling issues from the data. So we'll have five scales, these five here, and the indicators are three indicators for horizontal governance, three indicators for vertical governance, three indicators for collective sourcing, two indicators for export orientation, and three indicators for investment. Whether these indicators measure what the authors claim they do measure is a question that we will not address in this video. We'll just take a look at whether, for example, these export orientation indicators can be argued to measure something together that is distinct from the other indicators. So we have 14 variables and we want to assess whether they measure five distinct things. In an exploratory factor analysis, when we start the analysis, we have to define how many factors we extract. So one way to do that decision is to use a tool called a script plot. So the idea of a script plot is that we extract components from the data and then we have a variable here that quantifies how many variables both of variance each component explains. Some rules of thumb on how to choose the number of factors is that we can either choose five factors based on a pivot point. So a clear pivot point when the curve starts to go flat means that that's the number of factors that we should extract. Another rule of thumb is that we go as long as we get these eigenvalues more than one, which would be four factors. But here we know that this set of indicators is supposed to measure five distinct things so we can use the best rule of thumb, which is our theory, and the theory states that we take five factors because that's why we have five different things that we want to measure. So we apply factor analysis. We request five factors using these 14 indicators. We get the result print out from R. So what does the print out tell us? There are three different sections. The first section is the factor loadings. So these statistics tell how strongly the indicators are related to each factor and how much uniqueness there is in the indicators that the factors don't explain. The second section is the variance explained how much each factor explains the variation and then finally in the table in the bottom section we have different model quality indices. I don't typically myself interpret these model quality indices because if I want to really know if the model fits the data well or not I will do it with a converter factor analysis based techniques which have a lot more diagnostics options available. So in practice we interpret the factor loading pattern, how strong the individual loadings are and how much variance the factors explain. If you want to do more diagnostics then it's better to move into the converter factor analysis family of techniques. So the factor loadings here provide us some information. They provide us information about how strongly each indicator is related to each factor. The factor loadings are regressions of items on factors. So it's a regression path, it's a directional path because this is a standardized factor analysis solutions and the factors are uncorrelated in this factor solution which they are by default. Then the loadings also equal correlations. So this last item correlates 0.75 with the second factor. Then we have also the uniqueness here or the communality, the h square because how much of the variation of the indicator all the factors explain together and uniqueness, how much of the variance of the indicator remains unexplained. Sometimes the uniqueness is interpreted as evidence or a measure of unreliability. So uniqueness is 30%, we say that the indicator's error variance is 30%, 70% is the reliable variance. The problem with that approach is that the uniqueness also captures other sources of unique variation that is not random. So for example there is probably something unique in total quality management item that is not related to other investment items that would be reliable if we were to ask the same question again. So factor analysis puts unreliability variance the random error and the unique variance into one same number and there is really no way of taking them apart. So that's one weakness of factor analysis. The variance explained here shows that the first factor explains most of the variation but this is an unrotated solution so we don't really pay much attention to this except for one thing. So we can do a harmless single factor test which you sometimes see in reporting in papers. And the harmless test involves checking whether the first factor explains a majority of the data of the variance in the data and whether it dominates over the other factors. So we can see here the first factor is 25%, the second factor 16%. We can't say that the first factor would explain most of the data. We can't say that it would dominate over the other factors because 25% and 16% are still in the same ballpark. The harmless single factor test is a bit misleadingly named because it's not really a statistical test and it's not even a very good diagnostic because it will probably detect only very severe method variance problems. Nevertheless, it's something that you can easily check from the results of an exporter factor analysis if you want to do more rigorous tests of method variance then you can apply confirmatory factor analysis techniques that allow you much more decrease of freedom on what you can do. Let's take a look at the factual loadings. The idea of factual loadings is that they should show a pattern so we should see that the indicators that are supposed to measure the first three indicators that are supposed to measure one thing should load on one factor and one factor only and then the measures of the other construct should not load on that factor. It's not the case here and the reason why it's not the case is that this is an unrotated factor solution. Typically in a factor analysis when we extract the factors we take the first factor that explains the majority of the data and if the constructs that cause the data are correlated then the first factor contains a little bit of every construct. So it's all indicators load on it highly and we can't really interpret it. So we do a factor rotation and factor rotation simplifies the factor analysis result. It also makes another nice feature factor rotation can relax the constraint that all the factors are uncorrelated when we do the factor analysis. The zero correlation constraint is something it's a technical reason why we have it and it doesn't make any theoretical sense if we are studying constructs that we think are related. So if we think that two constructs are related causally or otherwise we cannot assume that the constructs are uncorrelated. Therefore imposing a constraint that two factors that are supposed to represent those constructs are uncorrelated doesn't make any sense. That's another reason why we rotate the factors which relaxes that constraint. The factor rotation simplifies the result and after rotation we can see that the first three indicators go to one factor, the second three to another factor so we have a nice pattern that each indicator each group of indicators loads on one factor only and there are no cross loading. So this would be evidence that these indicators for example these three indicators measure the same thing together and it is distinct from what these other indicators may measure. So you want to have this kind of pattern and it is indication of validity. Of course it doesn't guarantee validity because it doesn't tell us what these indicators have in common but it's some kind of indirect evidence that there could be one construct underlying driving the correlations between these indicators. Another thing that we look at from these factor loadings is their magnitude. So that's what we do when we assess the results and this is an example from Yleringösartikou. They have a table of factor loadings. They have the measurement items. They have labeled the factors. That's where you label the factors with the constructs names and then you look at the loadings. So the factor loadings here are interpreted as evidence of reliability. So the square of factor loading is an estimate of the reliability of the indicator and then we also have these statistics Z-statistic that is used for testing the significance whether the loading is zero or not. The null hypothesis is that loading is zero is very relevant so you want to really know whether the indicators are reliable enough not whether the reliability differs from zero. So this is not a very useful test but people still sometimes present it. The first indicator here is not tested. The reason for this is that this is from a factor analysis and there's a technical reason why the first indicator is not tested here. That's why I explained that in another video. Then the authors say that the standard loadings are all about 0.57 and the cutoff is 0.4. The commonly used cutoff is 0.7 but you can probably find somebody who has presented the lower cutoff if you do that kind of cherry picking but really normally we want the loadings to have 0.7 but reliability again is a matter of decree. It's not a matter of yes or no and you have to then assess what the unreliability means for your study results.