 Tarkoittaa identifiikkiä on tärkeää, mitä teemme, että se on esim. Identifiikkiä on tärkeää, että on yksi tärkeä solusi. Jos se ei ole identifioita, niin se tarkoittaa, että ei ole yhteistyössä, mitä on optimista parametra-valoista. Eli ei voi sanoa, että on parametra-valoista, tai aloista, kun se on yksi identifioita. Suurin, kun tuntuu identifiikkia, voidaan yksi puolesta. For example, 3 indicator factorit on aivan identifiattee. And there are other rules that you can apply. But there are also scenarios that are not covered by our rules. For example, the identification status of bi-factor models is a bit more complicated than identification of normal models. In these scenarios, it's important or useful to understand how identification can be proven. If you understand the principle of proving identification, then you may be able to prove the identification or non-identification of certain parts of the model, and that can help you in understanding if the model as a whole is identified or not. Identification or proof of identification can be a bit tedious, but understanding the principle is nevertheless useful. I will show you now how to prove the identification of a factor model with three indicators. So here's a model, we have one factor, we have three indicators and we have a rule that this is always identified. But why is it always identified and how do we prove the identification status? Let's first check how much data we have and how many things we want to estimate from that data. So we first need to set a scale for the latent variable because setting the variance or setting a rule for setting the variance of the latent variable is one of the things that we must do for every latent variable in the model. If our latent variables don't have scales, then the model cannot be identified. We have six sample covariances, so that's the equation where it comes from. We have three variances and then these three indicators have three correlations. X1, X2, X2, X3 and X1, X3 correlation or covariance. So we have six units of information. We are estimating six different parameters. So we are estimating the factor variance, we are estimating two loadings. The first loading is constrained to be one for our scale setting and then we have these three error variances that we estimate. So that gives us six and six minus six is zero, so we have zero degrees of freedom. This model, if it's identified, it's going to be just identified because we don't have any excess information. So we know that the non-negative degrees of freedom is a necessary condition for identification, but it's not sufficient. So how do we know is this identified or not? And more importantly, how do we prove it? The idea of identification is that if we know the correct population, we know the full population data, then we, from that full population data, calculate unique estimates for all the parameters of the model. So identification concerns the model and the population and not really about the sample. So there is the issue of empirical identification that I'll talk a bit later on this video. So let's start proving the identification. If this model is correct for the data, then these sample covariance should follow or population covariance should correspond to the model implied covariance. And these are the model implied covariance by the model. So for example, sigma 11, which is the variance of x1, is psi plus theta 1. And the variance of the second indicator, sigma 22, is psi multiplied by lambda square. So the factor loading, the second power plus theta for the error variance. So we have these six equations. And now we can start working on these equations and try solving the parameter values. So assume that the population covariances are known. The question of identification is, can we solve the values of every unknown parameter or every estimated parameter from these covariances? And how do we go about that? The first thing that we should observe is that these three first equations are simply used to solve the error variances. Why is that the case? Because theta 1 occurs only in the first equation, theta 2 only in the second equation and theta 3 only in the third equation. We need these three equations to solve these three parameters. And we can't use those equations for anything else. Because if we use this sigma 1, 1 equation for something else, then we wouldn't be able to solve theta 1. So we'll be just looking at these three covariances, sigma 1, 2, sigma 1, sigma 2, 3, and try to solve the factor loadings, the two factor loadings lambda 2 and lambda 3 and the factor variance psi from those equations. So let's take a look at how we solve them. So what we do first is that we try to eliminate one equation and one parameter at a time. So what we do is that we take the third equation here and we solve for psi. So that's the equation for psi. It is sigma 2, 3 divided by lambda 2 times lambda 3. So that's our equation for psi. And now we can eliminate psi from the first two equations. So we just take these first two equations and then we are plugging that equation for psi. And we simplify, we have these simpler equations. Each have only one unknown. And if an equation has one unknown, we can solve it. So we can just solve and that gives us lambda 2 and lambda 3. So this is high school math, pretty simple this far. How do we solve this psi then? Well, we just plug in the values of lambda 2 and lambda 3 into the equation of psi. We get this kind of equation. We simplify and that gives us psi. So psi is the product of two covariances divided by one covariance. So now we have solved the lambdas and the psi and we just need to solve the theta's to prove that this model is fully identified. So let's take a look at the original set of equations. That's here. And we have already solved this part. So we have solved lambda 2 and lambda 3 and we have solved psi. Then we just plug in the values of these solved values to these equations for the error variances. And we solve and that gives us the error variances. Pretty simple. So the three indicator factor model solving the identification is fairly straightforward. And for exercise it may be useful for you to actually do this solution yourself from scratch so you understand the principles of proving your identification and how this covariance algebra works. If we take a look at these equations in the standardized form these are the solutions for the standardized coefficients and the procedure for getting these coefficients or the solution is the same. One interesting thing here is that these square roots are the square root has basically two solutions. So we have the positive solution and the negative solution. So when we fix the scale of the factor model by fixing the variance of the factor then that leaves the signs of these factor loadings as indeterminate. So a factor model with let's say loadings of 1, 1 and 1 would fit the model there equally well as factor loadings of minus 1, minus 1, minus 1. It just switches the direction of the scale of the latent variable without affecting the variance. The direction of the factor loadings is actually indeterminate in the standardized solution. How do we know that these are actually the correct answers or the correct solutions? Well we can just apply these to empirical data. And I'm going to use R and the love on package so we are going to fit a simple three vectors model of x1, x2 and x3 from the Halsingen and Swine 4 dataset. And this is our estimates that we got. These are the standardized estimates. And these are the sample covariances that we have. We can apply these equations here using these sample covariances and verify that we actually get the same coefficients as we get here. So because this model is just identified, it means that we can actually solve the maximum likelihood estimates from the sample covariances. Solving an over-identified model would not be possible this way. It wouldn't produce the maximum likelihood estimates. The reason for this is that over-identification implies that at least one parameter can be solved in multiple different ways and we can get multiple different values depending on which covariance we apply. But here in just identified models, we can just solve each parameter just in one way. Okay, so let's take a look at another interesting thing here. And what will happen if one of the covariance between x1, x2 and x3 is zero. So we can see that we would have division by zero here. And of course you can divide by zero. So what will happen if we try to estimate a model where one pair of indicators is uncorrelated in the sample? Well, let's just try out using R. And this produces an empirical under-identified model. So what we do here is that we take the same dataset, Holzing and Swine for dataset and we just take the sample covariance and we replace one of those covariance as the course between x1 and x2 with zero. And then we try to estimate the software tells us that it cannot find a solution. And if it cannot find a solution, that typically means that there is no unique solution. So it's possible that there are multiple different solutions that are equally good. So another possibility is that one of the parameter values converges toward plus or minus infinity. That's also a sign of under-identification. And if we take a look at these parameter values, we will see that we have some extreme variance estimates minus 68 when the total variance of the data is roughly one. And then we have these rather extreme factor 120 compared to the others. And this is a sign that the model is not identified. Also we don't have the standard errors and that's a good indication of an under-identified model. How would you know why the model is not identified? And well, you would just have to, for example, look at the Hessian matrix and see how the optimization works to understand why it fails. Another thing that you can do here is work, try to work through the covariances and understand what is required for the covariance values for this model to be identified. But to be sure, just applying the three indicator factor rule and declaring that this model is not identified is not sufficient and it wouldn't work because those rules don't guarantee the identification for every possible set of covariances. It's possible that the model would be identified for some covariance values but not others. And this is the condition known as empirical under-identification case. Generally when you face this kind of problems, your software does not converge, it produces estimates that are extreme or doesn't produce standard errors. It's a good indication to start troubleshooting what is going on, why is it not identified instead of just trying different values and see whether you can make the problem disappear. Often it's the case that you make the problem disappear by hiding it instead of solving it, which is obviously not a good strategy for applied research.