 Let's take a look at an empirical example of Confrontor Factor Analysis. Our data set for the example of Kansan Meskuita and Lazarini. This is a nice paper because they present a correlation matrix of all the data on the indicator level. So we can use their table one shown here to calculate all the Confrontor Factor Analysis and the structural regression models that the article presents. And we will also get for the most parts the exact same result. So let's check how the Confrontor Factor Analysis is estimated in R and what the results look like. Specifying the factor analysis model requires a bit of work. I'll explain you the details of this syntax a bit later. But generally what we do first is that we specify the model. So we have to specify the indicators and for every indicator we specify one factor in this particular case and then we estimate using covariance matrix and finally we'll plot the results as a path diagram. So that's the plotting command and I have added some options to make the plot look a bit nicer. So let's take a look at the model specification in more detail. And we have here I have color coded this blue is for factors and green is for indicators. So we specify that we have about eight factors and then we specify how each indicator loads on the factor. So we have a factor horizontal measured with three indicators. We have factor innovation measured with two indicators and then we have factor competition measured with a single indicator. So we have three indicator factors, two indicator factors and single indicator factors which are the three scenarios that are explained in the video about model scale factor scale setting and identification. So what parameters do we need to estimate? We need to estimate our factor loadings. We are going to be scaling each latent variable using the first indicator fixing technique. So we will estimate our factor variances and factor covariances and indicator error variances. And the model is identified for using the following approach. We need to set the scale of each variable. We set the scale of each latent variable. We use the first indicator fixing and so we fix first indicator at one. That's the default setting so we don't have to specify it anyhow here. And then we need to consider how the three, two and one indicator rules are applied. So we have these three indicator factors. They are always identified. We have two indicator factors. They are identified because they are embedded in a larger system of factors. So we have these two indicator factors where we can use information from other factors to identify those loadings. So we don't have to do anything special. And then for one indicator factors we fix the error variances to be zero. So we say that these single indicators or single indicator factors are perfectly reliable. So we say that the error variances are zero for indicators that are the sole indicators of their factors. So as a path diagram the result looks like that. So we have factor variances here or factor covariance here, these curves. We have factor variances, these curves that start from a factor and then come back to the factor. We have factor loadings, these arrows from factors to the indicators. And then we have indicator error variances, these curved arrows here. Then these dashed arrows are something that has been fixed. So that's constrained to be one and that's constrained to be zero. So that's a single indicator factors, error variances is constrained to be zero. So that's what we have and there are funny things. So we can see here that we have some error variances that are negative. So this is a Haywood case. And I have another video explaining what a Haywood case is and why it occurs. So we have negative variances, they are close to zero. So we can conclude that maybe these indicators are just highly reliable and the error variance is actually close to zero. It's positive but close to zero and because of sampling error we get small negative values. So these are small negative values, we don't really care about that. We assume that they are highly reliable instead of this being a symptom of moral mis-specific case. Then I say that these results mostly match what's reported in the paper. So there's a small mismatch in the factor loadings. But otherwise these factor loadings here match exactly what the article reports. In text form there are outputs, a couple of things for us. So we have estimation information first. We have the decrease of freedom and we have chi-square that I'll explain in the next video. Then we have the actual estimates and the estimates list. We have our estimate standard error z-value and p-value. And this goes on for, it's a very long print out. And then we have some warnings. So the warning here is that we have the Haywood case so both of these warnings relate to that. Let's take a look at the estimation information part next. So this is the same kind of information is given you by any structural acres and modeling software. So it's not exclusive to R. You will get this estimation information and actual estimates. Let's take a look at the estimation information and the decrease of freedom first. So the decrease of freedom is 147 and that's the same as in the reported article. So where does that 147 come from? This is a good exercise to calculate the decrease of freedom by hand because then you will understand what was estimated. And there is a nice paper by Cortina and colleagues where they calculate these decrease of freedoms from published articles and they check whether they actually matches the reported decrease of freedom and they don't always match. So that's an indication that there is something funny going on in the analysis. Let's do the decrease of freedom calculation now. So where does the 147 come from? We have first 231 unique elements of information. So we had the correlation matrix of all the indicators has 231 unique elements. So that's the amount of information. Then we start to subtract things that we estimate. So we estimate 10 factor variances. So we have 10 factors. Each factor has an estimated variance. Then we estimate 45 factor covariances. So 10 variables have 45 unique correlations. Then we subtract 11 factor loadings. So remember that when we always fix the first loading to be 1 to identify the factor. So we had 21 indicators. 10 are used for scaling the factor. Then we estimate 11 loadings. Then we have 18 indicator error variances. We have 21 indicators but 3 are single indicator factors. So we have to fix the error variance to be 0. And that gives 147. So that's the decrease of freedom. We can check that our analysis actually matches what was done in the paper by comparing the decrease of freedom. And also comparing the chi-square. The 147 decrease of freedom tells us that we have excess information that we could estimate 147 more parameters if you want to. After 147 parameters we have used all information that we couldn't estimate anymore. We can also use the excess information to check if the excess information matches the predictions from our model. And that is the idea of model testing. So we can use the redundant information to test the model. So we have more information that we need for model estimation. We can ask whether the additional information is consistent with our estimates. If it is then we conclude that the model fits the data well. So the idea of model testing is that we have the data correlation matrix here. So that's the first six indicators. Then we have the implied correlation matrix here. And then we have the residual correlation matrix here. Again the estimation criterion was to make this residual correlation matrix as close to all zeros as possible. By adjusting the model parameters that produce the implied correlation matrix. And then these are pretty close to zero. And if our model fits the data perfectly it means that it pre-produces the data perfectly. It already shows us zero. And we want to know if the model is correct for the population. So the question that we ask now is whether this model would have produced the population correlation matrix if we had access to the actual population correlation matrix. In small samples the actual sample correlations are slightly off. So they are not exactly in the population values. And therefore the residuals are not exactly at zero. So we ask the question are these differences from zero small enough that we can attribute them to chance. So is it plausible to say that the model is correct. But it doesn't reproduce the data exactly because of small sample fluctuations in the correlations. This question can these residual correlations be by chance only is what the chi-square statistic quantifies. So we have the chi-square statistic here. It's a function of these residuals. And it doesn't really have an interpretation. But it's distributed as chi-square with 147 degrees of freedom and we can calculate the p-value for it. The p-value here is 0.25. So we say that if the residuals were all zeros in the population then getting this kind of result by chance only or greater we would get 25% of the time. So we then cannot reject the null hypothesis. The null hypothesis is that these are by chance only. We cannot reject the null hypothesis. Therefore we say that the model fits the data well. This is the logic of the chi-square testing in confrontatory factor analysis and structural regression models. So we want to say that these differences are small enough that we can attribute them to chance only. And we accept the null or actually fail to reject the null. So then we conclude that this evidence does not allow us to conclude that the model is misspecified. So we want to have a p-value here that is non-significant because it indicates that our model is a plausible representation of the data and we concluded the model fits. Let's take a look at the estimation information again. So estimation information gives us the p-value, the decrease of freedom and chi-square statistic. Then we get estimates and then we get these warnings. So every time when you get warnings then you need to actually look at what the warnings mean. So here our code actually tells us that we should run inspect fit theta. So theta matrix is the error correlation or the residual indicator error term covariance matrix estimated from the data and we should investigate it. So recall that we have the Haver case. We have these three negative error variances. And then when we do inspection of the theta matrix, so the theta matrix contains here, there are estimated error term variances. So estimated indicator error term variances. All the covariance within the error terms are constrained to be zero because we didn't estimate them in this model. And we can see here that we have these three negative values here. So what do we do with that? We conclude that these are so close to zero that it's plausible that they're actually small positive numbers, but this is just a small sampling fluctuation outcome.