 Tällä isoa problemeista formatiivista miskaista on edetä, että sillä on väärittävää vietäjärjestelmää. There are also statistical issues in how these models are specified and how particularly the models are identified. I will explain a couple of these issues in this video, there are a couple more, but they are not as important as these two issues. The root of the problem is that a formative model, where we specify this latent variable as a function of observed variables in three in this example and this unobserved error term is not identified in itself. It's like a regression analysis without the dependent variable basically. It's not identified because these correlations within these are three indicators are free and that consumes all our decrease of freedom and we don't have any more information for estimating these paths or the variance of the error term. So the decrease of freedom is negative. There are a couple of ways around this problem. The most commonly recommended way is that we add two normal indicators. The literature on formative measurement calls these reflective indicators. So we specify that this latent variable here actually is a common factor for these two measurements. And these measurements are added there for identification of the model. So this leads to an interesting problem and the problem is actually that this latent variable here is now defined by these two normal indicators instead of these three formative or causal indicators. So these factors, these indicators, these measure one and measure two actually give this latent variable its identity and meaning. So I have written a couple of papers about this topic. But the problem essentially is that if these causal or formative indicators are not valid measures of this latent variable but these indicators are, then these weights or regression coefficients here will simply be estimated as zero. So we have a normal latent variable measured with two indicators and then we have three unrelated indicators that don't really have any relationship with the latent variable defined by these two variables here. So that's one problem. Another way of thinking about this is that if we have these two indicators here that measure the latent variable then these three indicators here at the bottom are, you don't need them. So you can just define the model and measure it normally with these two indicators and there are no problems with that. And that of course doesn't go well with the idea that some concepts must be measured with these formative indicators. So that's one problem and what's the cause of this phenomenon that the meaning of this latent variable comes from these two measures instead of these three measures is that we have this error term here and the error term guarantees that whatever these indicators represent then this error term will make, because it's unrelated with these three indicators here it makes the latent variable to be a common factor of these two indicators. So if these three indicators are conceptually unrelated to whatever these two indicators represent then the error term here will compensate for that and we are basically just modeling the error term with these three indicators instead of whatever we think that these causal indicators here cause. So that's one problem and how we deal with that problem we can of course eliminate that problem by eliminating the error term from the model but that gives us, leads to another problem. So let's consider this kind of model. So here this is not a latent variable anymore because this formative latent variable is actually just a weighted sum of these indicators. There's no error term, this is like a recursion analysis without an error term. Then how do we set these different weights? So we create an index based on three different indicators. We set these weights. The normal way of defining this use or specifying this kind of model is that we have this latent variable here without an error term and then we have another latent variable that we want to explain with this latent variable and we have a recursion relationship. Specifying a model like that defines these weights so that they maximize this path. Is that problematic or not? Well it is problematic because if we want to test for example whether this beta here is zero or not whether the beta has an effect on this other latent variable then setting these weights so that the beta is as large as possible is probably the worst possible way that you can create an index. So if you want to test if something exists then trying to use any correlation in your data to make your estimate as large as possible it's not a good estimation principle. So there's possible positive bias. There's also another problem is that if we set these weights so that this beta is as large as possible then the weights actually depend on whatever this other latent variable is and this leads to a problem called interpretational confounding in this literature. So the meaning of this latent variable here that is supposed to be caused by these three formative indicators actually depends on what's the other latent variable with other variables we have in the model and that's undesirable. So if you think about the stock index would it make sense that the stock index would be different depending on who is using the index? I don't think so. It should be the same. So the meaning of the index should be same across studies which means that these indicators, these weights also must stay the same. Then there's also the assumption that if these indicators here have any effect on this other latent variable then they must be fully mediated by this formative latent variable. So let's consider socio-economic status. So that's our formative latent variable. One of the indicators is your education and then we want to explain child's education with parent's socio-economic status. Is it reasonable to assume that the parent's education has no other causal effect on child's education than through the full mediation through socio-economic status? That is clearly unreasonable. So the full mediation assumption here is also unreasonable. So what's the alternative? The solution is to define these weights based on theory. So you set the weights based on your understanding of the phenomenon instead of trying to estimate the weights empirically and that leads to index construction. So instead of doing this complicated latent variable model that possibly has an error term, we just take the indicators and we take a mean or we take a sum or we take a weighted sum and we do that before our estimation and we define the weights for the index construction based on existing understanding of the phenomenon or the theory. And I have another video of how you can actually do that and how you justify index construction. So that's clearly a good approach, a lot better approach than trying to specify these formative latent variable models.