 Establishing the identification of non-recurssive SCM models can be a challenging task. Non-recurssive models are the models that have either correlated disturbances or feedback loops between two or more variables. They can be contrasted to recursive models that don't have either of these features. Recurssive models are simple to identify because they're always identified. So when are non-recurssive models identified? You can of course try to establish identification by proving the parameter values from the population covariance matrix, but that's very challenging to do, particularly if you have these feedback loops here. So fortunately there are a couple of rules that we can use to rule out non-identificate, to establish identification or establish that the model is for sure not identified. The two simplest ones are the null B rule and this rule states that all models with no paths between endogenous variables are identified. And these can be estimated consistently with OLS regression 2. The rule is named null B because in SCM software these models are estimated using covariance or parameter matrices and the B matrix or beta matrix contains all these paths between endogenous variables and then there are no paths so it's null matrix, null beta comes from that. Then we have T rule. The T rule is a counting rule where you count the number of unique variances and covariances in the data and then you subtract the number of parameters that you want to estimate. If the result is negative then your model is not identified. This is the same as checking the decrease of freedom which you can do after estimation. If it's negative then we know that the model is not identified. So if decrease of freedom is negative you don't need to do any other considerations. You know that that model needs to be fixed somehow or you need to add more information to allow estimation. So how about more complex scenarios? For example if we have this kind of feedback loop with four variables and then one parcel mediation in between. Is this identified or not? The rules on the previous slide don't really help and well this is identified. We know that but why do we know that? That's what we are going to talk about next. So let's start talking about these feedback loops that make it difficult to establish identification. In the simplest case we have a feedback loop between two variables. We have Y1 influencing Y2 and Y2 influencing Y1. So how do we know whether this is identified or not? We can apply the T rule. We know that we have one covariance here and we are estimating our two parameters relating these variables to one another. The decrease of freedom is minus one. This is under identified and we don't need to consider it any further. If you have studied instrumental variables then you know that in econometrics this kind of a scenario is called Simultaneity and it can be estimated by using instrumental variables. So we need an instrumental variable for Y1 or Y2 and that will identify the model. The X1 here is serviced as an instrument because it affects Y1 and doesn't not affect Y2 and is uncorrelated with the error term. And that identifies the model, decrease of freedom is zero and it's just identified. What if these disturbances are correlated? So we add a correlation, decrease of freedom is minus one, this is under identified. We can again identify the model by adding a second instrument but before we do that we need to note that these paths from X1 to Y1 and Y2 are actually identified even if the full model is not. So this is called local identification. Even if your model is not fully identified which means that all the parameters are identified it may be possible to meaningfully estimate some parameters. So here these two parameters are identified and this path from Y2 to Y1 and this correlation between the error terms is not identified. So we can estimate some parameters but not all from the model. We can fully identify the model by adding second instrument. We have a decrease of freedom zero and this is a just identified model. Important thing in identification and many of these identification rules is that we need to have sufficient number of instrumental variables to allow estimates. So there is a link between the social ecosystem model in literature and the econometrics literature in instrumental variables. If you understand one then you will understand the other better. Let's take a look at larger models. So we really have models with just two endogenous variables. Sometimes we have more and this is an example of four endogenous variables. So here Y1, Y2, Y3 and Y4. And this is what we call a block recursive model. So block is a group of equations. It's the smallest possible group of equations that feels that does not have correlations between equations from other blocks and also all the effects from one block to another go unidirectionally. So there are no feedback loops. So we have one block here. So Y1 and Y2 form a block. They must be in the same block because there is a feedback loop and then there is this correlation between these two disturbances. These exogenous variables don't belong to any block so we just block the endogenous variables. These two endogenous variables, Y3 and Y4, can be assigned as their separate block. They need to be in a block because there is a feedback loop and then there is also this correlated disturbance. Why we can split this model into two blocks is because there are no correlations between these disturbances here and also all the effects from the first block go to the second block and there are no effects the other way. So this is a block recursive model and we can split it into blocks. Now the identification rule for these models is that block recursive models are identified if every block is identified. So we can break the big identification problem here involving four equations for four endogenous variables into two smaller problems involving two equations each. Rigdon points out that most models that people estimate actually are block recursive and can be broken into these blocks that contain one or two equations. So how do we establish the identification for this one? We have Y1 and Y2, we have bidirectional path and we have also correlated disturbance here so we need two instruments. X1 qualifies as an instrument for Y1, X2 as an instrument for Y2 and then we move on to the second block because that identifies that block. In here we have Y3 as instrument Y1 and Y4 as instrument Y3 and that is sufficient to identify that block because both blocks are identified, this model is identified. The block rule is clearly a very useful rule for identification. So how about more complicated models? So this model here, we can break it into blocks because we have a feedback loop between all three variables and also the T rule can be applied because this has a decrease of freedom of zero. We have three covariances between Y1, Y2 and Y3 and we are estimating three paths between these variables. So we estimate three things from three covariances that in principle can be done. This can be proven to be identified but doing so is quite difficult. You can think of this intuitively as a two variable loop so you have Y1 affecting Y3 and Y3 affecting Y1 but the effect of Y1 on Y3 is fully mediated with Y2. So based on the relationship between Y1 and Y2, Y2 and Y3 we can say how strong this effect goes through Y3 is in comparison to this direct effect from Y3 to Y1. So that's one way to understand why this is identified. This paper here provides a proof but it's fairly difficult to read under so you have done lots of identification problems before. So what if we make this a bit more complicated? Let's say that we have a correlated error term, correlated disturbance between Y1 and Y2. We can identify this model by adding X1 as an instrument. So when these two error terms are correlated then Y1 is endogenous with respect to Y2 and having an instrument variable here helps to identify the model. What if we have all correlations to be free? Well, we can add three instruments that is sufficient to identify the model. This is actually identified according to the rules that you can find in any good book on structural acres and modeling. The rules are called rank and order conditions. So both are satisfied in this model. And this is how Klein explains this problem. He has the same exact model that we had. It's just drawn sideways instead of like horizontally like we did and we have Y1, Y3, Y2. This is a loop that goes this way like we had before. So why is that identified? We can apply the order condition. So the order condition is a condition for checking for non-identification. If order condition fails, then a model is not identified for sure. If order condition holds, that does not necessarily mean that the model is identified. So this is a necessary but not sufficient condition. The order rule states its accounting rule that we need to first start by counting all the endogenous variables in the model. We have Y1, Y2 and Y3. So we have three endogenous variables. Then we subtract one. So we get two and that's our benchmark. Then for each endogenous variable, we need to exclude at least two of the other variables in the model and to identify that equation. So Y1 excludes three variables. It excludes Y2, so there's no effect from Y2 to Y1. It excludes X2, there's no effect from X2 to Y1 and it excludes X3, so there's no effect from X3 to Y1. So we have three variables excluded. For Y2 we have excluded X3, X1 and Y3, X1 and X3 and for Y3 we have excluded Y1, X1 and X2. So in all cases the number of excluded variables is equal or greater to a two, which means that this model is possibly identified. This is a model that is a rule that is necessary but not sufficient. The model is identified also according to another condition. But let's take a look at what it would require for the order condition to fail. So for example if we have path from X1 to Y1 and then X3 to Y1 then we would have only one excluded variable, which is Y2 for Y1. And that would be a problem because we can no longer use X3 as an instrument for Y2 to identify this relationship here from this disturbance correlation. So you can think of these through instrumental variables. If we include all possible paths from all the Xs to Y1 then these other X variables can no longer serve as instruments for Y3 because then they are not excluded. Then we have another rule called the rank condition and the rank condition is necessary and sufficient for identification. And the rank condition is defined using matrices and I will not go into those matrices in this video but the idea of rank condition is basically that each Y variable must have a unique pattern of Xs relating to it. It holds here because XY has a unique pattern in that it is affected by X1 and no other variable is affected by X1 only. Y2 has a unique pattern because X1 is affected by it no other variables are affected by X1 and so on for Y3 and X3. So that condition holds. So summary of identification. Identification can always be proven using algebra but doing so can be difficult particularly if your model is very large. I would think that most researchers don't try to identify their models by trying to prove that each individual parameter can be solved from the population covariance matrix. Fortunately we have a couple of easy to use rules the null beta rule, the t rule, the order condition and the block recursive rule that can be applied to check whether model is unidentified or to establish that it is identified. These actually cover most of the scenarios that applied researchers face. Unfortunately no simple and general rule exists. So there are always models that are not covered by these simple rules so if you have models with feedback loops of four variables and then you have just one instrument no correlated disturbances then for these rules would not cover that case. Then you will need to use algebra to identify the model or you need to just test for identification empirically. Some advice, practical advice for checking identification. Consider identification prior to data collection. If you want to estimate a model that contains feedback loops then collecting data and after that realizing that you don't have enough variables to identify the model you can't do much about it. So you need to have the sufficient number of instrumental variables to identify the model and you need to know how many you need prior to data collection. Then check for obvious sources of non-identification. Do each endogenous predictor have an instrument? If not then you are probably in trouble. Order condition is another easy to check rule and of course checking whether the decrease of freedom is non-negative is yet another one. If you can't using these rules if you can't establish that the model is non-identified you can either go for the rank condition which is more difficult to apply or you can just do identification empirically. Or if you have a block recursive model evaluate one block at a time using the rules for the model for just two variables. If identification cannot be established using these rules then you need to do it empirically for example by analyzing simulated data sets to see if you can get correct results analyzing simulated data sets comparing results from different starting values to see if you get unique estimates and so on. That requires some work but it's not that difficult to do.