 Täällä on kaksi asioita, joita pitäisi katsomaan, joita voimme esittää esittämään ongelmasta, kuten ongelmasta ja identifiikasta. Ongelmasta on tarkastella, että kaikkia varioita pitäisi olla metriä. Joten meidän on täytyy esittämää varioita, joita on varioita, ja identifiikasta on tarkastella, että data on todella paljon informaatiota, että esittämään ongelmasta, joita haluamme esittämää. Tämä on haluamme esittämää samalla, jota on todella todella yksi. Ja on possettava, että haluavat esittämään varioita, joita on matematiiksan ja possettavaa esittämää theon yleensi. Tämä videossa voimme esittämää, joita on haluatlaa, mitä haluatelua, joita voisi esittämää varioita. Joten tämän osin katsomme yksin, johon ongelmasta A1 ja A2, ja toki haluamme esittämää varioita. Ja meillä on kaksi variaatioita, nämä kaksi variaatioita täällä, ja sitten meillä on kaksi faktorologioita. Joten meillä on viisi asioita, joita haluamme esittää, ja joten on viisi asioita. Sitten me alkaa esittämättä. Me calculemme model-implate-korelaiset. Joten meillä on kaksi variaatioita, variaatioita A3, A2 ja variaatioita A1, ja sitten yksi korelaiset. Joten meillä on yksi kaksi asioita, joita haluamme esittämää dataa. Joten me alkaa esittämään model-implate-korelaiset. Joten meidän on yksi asioita, ja meillä on kaksi asioita, joita haluamme esittämää. Joten kaksi asioita on yksi asioita. Ja se ei voi olla esittämättä, tai se ei voi olla esittämättä tietysti. Joten yksi asioita on, että ei voi esittämättä 4 tehtäviä asioita. Joten se on se idea. Jos sinulla on lisää, niin haluaa esittämättä. Joten tämä ei ole esittämättä. Ja meillä on tietysti, että voimme esittämättä model-implate-korelaiset, jotta voimme esittämättä jotain tekisistä, tai voimme esittämättä enemmän asioita, ja kaksi asioita. Joten tämä ei ole esittämättä, joten asioita on negattavuus. Factor analysis, without additional constraints, always requires at least three indicators. Factor analysis of two indicators only is not a very meaningful analysis anyway, because while you can make it identified by saying that these factor loadings, for example, are the same, that would identify the model, then the estimation wouldn't give you any meaningful information anyway. So let's take another example or work more with this example. So let's assume that our correlation matrix for this two factor model, each with one indicator, is, so we have A1 and B1, they are correlated at 0.1 and we have three parameters that we want to estimate. So you can't, we have one correlation that depends on three parameters and these other variances don't depend on the model or they do depend on their terms but we don't really care about those in this video. So why is the correlation between A1 and B1 so low? There are basically three different options. It's possible that A1 and B1 are both highly reliable indicators of these factors A and B. It's also possible and then the A and B are just weakly correlated. It's also possible that A and B are highly correlated but A1 is unreliable and therefore we observe only a small correlation or it's possible that A and B are highly correlated but B1 is unreliable. The problem is that we cannot know which of these three options are, is correct because they all have the same empirical implication which is that this correlation here is quite small. So that's another example of non-identification problem. Here we are estimating five things so we have two error variances, we have two factor loadings and one factor correlation. We are trying to estimate it from just three elements of information. We can't do that. The model is not identified. We cannot know which one of these three explanations is correct empirically. Of course we can then use theory and rule out one of these alternative explanations based on theory but that goes beyond our factor analysis estimation and identification. So this model is not identified. It cannot be estimated meaningfully. Let's take a look at our scale setting now. So the identification basically means that you have more information than what you estimate. So the number of unique elements in the correlation matrix of the indicators must exceed or be the same as the number of free parameters that you estimate from the model. Okay. So normally we have in export the factor analysis we have standardized factors. So the idea is that all the factors have variances of one, means of zero in the exploratory analysis and that defines the scale of these variables. So every variable must have a variance. In the exploratory analysis the factors are scaled to have unit variance. So they are standardized and then all the factor loadings are then standardized because of coefficients for that reason. Then what if we don't standardize the factors. So we are saying that instead of saying that the factors variance is one we are estimating the factors variance. So we add these factor variances here and factor variance here. So we have 15 free parameters. We still have 21 units of information from which we estimate but we estimate 15 different things. So the decrease of freedom is six which means that this model is over identified. So it's positive so in principle it is possible to estimate this model meaningfully. We can do the estimation. So let's assume that that's our observed correlation matrix. That's our implied correlation matrix. Then we can find the values for the phi and the lambdas so that this implied matrix reproduces this correlation matrix perfectly. In this case that's possible because these correlations all have the same value. Generally in small samples you will never complete reproduce the data but this example you do just to simplify things. So we can estimate and that's one set of estimates that will give you the exact exact fit between the observed variable, observed correlation matrix and the implied correlation matrix. So we're fine right. Turns out we have a small problem because there's another set of estimates that also reproduce the correlation matrix perfectly using the implied correlation matrix. So you can plug in these values to the equations and see that they produce the exact same implied correlation. So we have here factor A's variance is one versus factor A's variance is two and therefore they produce the same fit. So what do we do? We can go and come up with indefinitely many examples. So if factor A's variance is 0.5 then we will have different values for factor loadings but still there are empirical correlation matrix reproduced perfectly using the model implied correlation matrix. So this is the problem of scale setting of latent variables in confronted factor analysis models. So we need to set the metric. So the factors themselves because we don't observe the factors. They are just arbitrary entries. We don't know whether they vary from 0 to 1 or 0 to 1 million or minus 5 to plus 10 or whatever. We don't know their range. We don't know their variances. We don't know their means. We have to specify the scale of each factor ourselves. In exploratory analysis we typically don't model means and then we assume that we fix the variances of the factors to be ones. In confronted analysis there are reasons why we don't fix the variances to ones that I'll explain a bit later. But the problem generally is that we must define whether we are talking about centimeters or inches. Do we talk about Celsius or Fahrenheit? They quantify the same exact thing and they are equally from equally good measures from a statistical perspective to measure length or temperature. We have to agree on what is the scale that we are using. Also a regression gives us the one unit change, the effect of one unit change in the independent variable on the dependent variable. Considering our regression coefficients only makes sense after we have considered how we define the unit. So what is the unit of A and what is the unit of B? We have to set them manually so we have to decide a scale setting approach. In exploratory analysis as I said we typically say that factor A and factor B on all factors have variances of one. That produces standardized factor loadings which are standard that regression coefficients of the indicators on the factors or in the case of uncorrelated factors they equal correlations. We use that in exploratory factor analysis. We cannot use that in structural regression models. Structural regression model is an extension of a factor analysis model where we allow regression relationships between the factors. The reason why we can't use these approaches is that the variation of an endogenous variable, a variable that depends on other variables is a sum of those other variables. So we can't say that a variable's variance is one if that variance depends on other things in the model. But that's beyond this video. Another very common approach is that we set the first indicator to be fixed, the first indicators loading to be one. This is the default scale setting approach in most structurally Gaussian modeling or confirmatory factor analysis software. The reason is that this can be used pretty much always regardless of what kind of variables we have here as A and B. And what kind of relationship we specify between A and B. And the idea is that we scale that if we assume that classical test theory holds. So all these errors here are just random noise. Then the variance of A is whatever is the variance of the true score of A1. So that's also appealing if we consider that the only source of error is random noise. Then the variance of factor A is the variation of A1 or what the variation A1 would be if it wasn't contaminated with this random noise here. So that's also one reason why this is appealing. It allows us to consider the scale of these indicators without error variance assuming classical test theory holds for the data. And this is such a common approach that there's a rule of thumb that I present always use the first indicators to fix the scale. We can see that the papers that we have used as examples in these videos are using these approach. Meskidan lasarini, you can see that all loadings of first indicators are once. So they set the scale of the latent variable by fixing this loading to one. And then they have the Z-statistic here. And you can see that the first indicators don't have a Z-statistic. The reason is that they are not estimated from the data. Instead a researcher says that these are once, they are not estimated. And if something is not estimated, it doesn't vary from sample to sample. So it doesn't have a standard error. So we can't calculate the Z-statistic for it. We can see the same in Yli-Rengos paper. So Yli-Rengos paper, the first loading it's not one, but it doesn't have a standard error. And that, it doesn't have a Z-statistic, it doesn't have a standard error. So that's indication that they actually are fixed the first loading to be one to identify or to scale the latent variables. If you want to have standardized factor loadings, so you want to, if you want to have loadings that are expressed in the scale of the exploratory analysis where the factor variances are once, then you can rescale the confirmatory factor analysis results afterwards. So your software will produce that for you if you check the standardized estimates option there. So these are standardized estimates, but the scaling has been done after estimation. So you first estimate an unstandardized confirmatory factor analysis where each factor is scaled by fixing the first indicator, then you scale the resulting solution. That's the same approach that you use for standardized regression coefficients. You first estimate regression, then you scale the parameter estimates later. So the summary of identification of confirmatory factor analysis models. A model is identified if every latent variable has a scale and if the decrease of freedom is positive and it's also every part of the model has to be identified. In confirmatory factor analysis, after we have established that every latent variable, every factor has a scale, then all factors with three indicators are always identified. So three indicators, if you have three variables, you can always run a factor analysis no matter what. Then if you have two factors, then we can either say that fix that both are equally reliable. So we fix the factor loadings to be once or we can embed this factor in a larger system. So just two variables alone, we can't estimate a factor model unless we fix these factor loadings to be the same. If we embed this two-indicator factor into a larger factor analysis, then we can estimate it because we can use information from other indicators to estimate these factor loadings. And one single indicator rule, if we have a factor with just a single indicator, then we cannot estimate the reliability of the indicator because you cannot estimate reliability based on just one measure. That's the idea. We have to assume what is the error variance. And typically we do that by concentrating the error variance to be zero. So we say that this factor A or construct A is measured without an error if we can't estimate it. Of course we could constrain the error variance to be something else. If we know that the indicator has typically shown to be 80% reliable, then we can fix these variance here to be 80% of the observed variance of the indicator. But that's really done. So identification is a requirement for estimation. If our model is not identified, it cannot be meaningfully estimated. Identification basically means that do you have enough information to estimate the model? If we have one correlation, we can't estimate two different things from one correlation. You need at least one unit of information for everything that you estimate. Ideally you have more information so that you have redundancy. So we need to have a scale for each latent variables and the decrease of freedom must be non-negative. Ideally it is positive and the more positive it is, the better our model tests are.