 In factor analysis we estimate a model where one or more unobserved factors explain the correlations or covariances between the observed indicators. These factors are latent variables which means that we don't observe the values for the factors. However, those values can be estimated, the estimates are called factor scores. Klein explains that the factor scores are commonly used post estimation thing and they are calculated as a weighted sum of the indicators. He also points out that there is a robust way of calculating scores called unit weighting in which you simply take a sum of the items belonging to a single scale and use that as your factor score. In fact, my take is that this is pretty much the only way you should ever calculate factor scores. To understand why I think that way we need to understand what factor scores do. Let's take a look at an example. My example is a very large sample size 100,000 observations from a population with two factors and we have three indicators, one, two and three, load on the first factor and first factor only. There are four, five and six loads on the second factor and the second factor only and the factors are correlated at point five in the population. Then we run an exporter factor analysis after which we calculate three sets of factor scores. The first stone scores or regression factor scores is the first set, then we have Bartlett scores and we have Tenberg scores. There are maybe half a dozen other well-known scores as well, but these are the three most common or the Tenberg is a variant of a common idea. The factor analysis results are here. So as expected, we have only no cross loadings in the population, the sample size is large. In the factor analysis, there are no cross loadings. So the first three indicators load only on the first factor, the second three load only on the second factor and the factors are hardly correlated at point five. Then the factor scores. So here are the scores or the correlations between the scores. The true scores here are the scores that are used to generate the data. So these are the actual values of deleted variables that we have no hope of ever observing, but in a simulated data set we can of course simulate their values. The factors are correlated at point 498 in this large sample and then these factor scores are correlated with the factors and with one another to various degree. Let's take a look at the regression scores first. The regression scores maximize reliability. And what this means that the factor score for F1 using the regression technique correlates with the true scores of the factor more highly than with the Bartlett scores or the Tenberg scores. So they are the most reliable. But this reliability comes with a cost. The regression factor scores are biased, which means that they are correlated not only with the factor that they are scored for, but with the other factors. So these, the factor score F1 using the regression technique also correlates pretty highly with the second factor. And this becomes a problem if you want to use the regression factor scores to test a hypothesis or estimate a correlation between the factors. So we can see that the correlation between the regression factor scores is actually greater than the correlation between the factors. And this is not supposed to happen because this correlation is supposed to be affected by attenuation and be lower than the actual factor correlation. Bartlett scores address this problem. So they are reliable like regression scores, but they are also unbiased. So they are not too highly correlated with the other factors. So this is the correlation with the scores and the factor assuming that there is attenuation going on. The final set of, and as a conclusion or as a consequence, we have this correlation with the factors, which is not overestimated. Then the final set of scores in the table is the Tenberg scores. And this is a variant of correlation preserving factor scores. The idea of correlation preserving factor scores is that the scores will correlate exactly the same amount as the factor. So if we have a correlation matrix with three factors and they have three correlations, then with the correlation preserving factor scores, the three correlations between the three factor scores would correlate exactly the same as the actual estimative factors. So which one should you use? The regression score is typically the default and this makes sense if the purpose of the scores is to rank order cases. But if the purpose of the scores is to test if the factors are correlated, then regression scores is probably the worst. Let's take a look at how these are actually calculated. When we think about scale scores, we assume x1, x2 and x3 form one scale, x4, x5 and x6 form another scale. A researcher would normally calculate the scale score for the first scale as a mean of x1, x2 and x3 and a scale score for the second scale as a mean of x4, x5 and x6. But factor scores don't work that way. So every factor score is a weighted sum of every indicator in the model. Those that belong to the factor and those that also belong to other factors. And this is the reason why the regression scores are biased. If we look at the scores for f1 and the weights, so the scores for f1 are calculated using these weights multiplying the indicators and taking a sum. We can see that the indicators for the second factor, x4, x5 and x6 are used to calculate scores for the first factor. Why is this the case? The reason is that the purpose of these scores is to maximize reliability. And if our indicators are measured with error, like they pretty much always are, and factors are correlated, then indicators of f2 carry unique information of the variance of f1 as well. So we can get more reliable predictions of f1 by using also the indicators of f2. And that makes sense if you want to rank order cases, but if you want to test if the factors f1 and f2 are correlated or you use those factor scores to present some constructs, then that would be a really bad idea to use regression scores because of the bias. The bar test scores are unbiased because these influence of the other indicators are constrained to be zero if there are no crossloadings. If there are crossloadings, then other indicators might be used. So what's the deal? Should we use factor scores? The thing is that the reliability differences are pretty trivial. But if you compare regression scores, the reliability is 0.883. For the regression scores for tenberg scores is 0.882, 0.881, 0.882. So the difference is in the third digit. In the bar test scores, this is not the substantial difference. And our sample size was very large. The factor scores are also potentially biased, particularly the regression scores. And as a conclusion, it's probably always best to simply take a sum of those indicators that measure one thing, take a sum of those indicators that measure another thing instead of using factor scores that use all the indicators. Another thing with factor scores is that they are commonly misunderstood. Some researchers seem to think that factor scores are the same as the factors. So I reviewed a few studies that apply an exploratory factor analysis, and then they say that they take factor scores which correct for measurement error and then use those in regression analysis. Factor scores don't work that way, so they are not 100% reliable. Another important limitation in factor scores is factor indeterminacy. So factor scores, particularly the correlation preserving nature, variant, they correlate exactly similarly to what the factors in the model correlate. But there are no guarantees that they correlate well or they correlate to the right amount with something that is not included in their model. So calculating factor scores and then using those factor scores to explain some indicators or variables that are not included in the factor analysis is generally not a good idea because of factor indeterminacy. Then if you simply want to maximize reliability, calculate scale scores, as we point out in my article, there are decades of research that point out that any advantage of differentially weighting indicators is pretty much trivial compared to simply taking a sum of the indicators. But it's useful to understand what these do just to caution researchers against not using them if you are, for example, a reviewer.