 Model-implied instrument of variables on another technique for estimating latent variable models. This is not really a mainstream technique in management, but it received some attention recently and there are a couple of high-profile methodological papers that advocate this technique. The technique is also useful for diagnostics and it's useful if you want to understand what instrument of variables do. So let's take a look at what model-implied instrument of variables estimation technique for latent variable models does. When we have a latent variable model, the problem is that we're interested in the latent variables A and B, but we don't observe those variables directly. Rather, we observe indicators. We have A1, A2 and A3, B1, B2 and B3 and these indicators contain variance from the latent variable the As and Bs, but they also contain error. The E is unreliability and the specific variance the Q, W and R, ST and Y. The first technique that we learn when we start working with this kind of models is just to take some of the indicators and run a regression analysis. The problem with regression analysis of scale scores is that if those scale scores are contaminated measurement error which they pretty much always are, then the regression coefficients will be inconsistent and biased. The problem is really the measurement error of the independent variable. The reason why the measurement error of the dependent variable is not problematic, because the measurement error is assumed to be uncorrelated with the factors, so it just goes to the error to this regression analysis. There is no way for these measurement errors E's and Q, W and R to escape from the independent variable. We could fit a latent variable model with maximum likelihood estimation, but there are other techniques. The modeling plan instrumental variable technique has six steps or the analysis procedure that the Bollen, who is the leading advocate of this technique, recommends an analysis of six steps. The first two steps, the model specification and identification are the same with any other latent variable modeling approach. So the idea is that you first specify a model, then you ensure that the model is identified, so each latent variable must have a scale and must have sufficient number of indicators and so on. Then you have three steps that are unique to modeling plan instrumental variable estimation. The first thing, step number three, is that you do this latent to observed transformation. So basically you take some of these observed variables and use them as proxies for the latent variables. That produces inconsistent estimates in regression analysis, but you can do instrumental variable estimation. So you can use, if you use the A1 indicator as proxy for the factor A and B1 indicator as a proxy for the factor B, you can use the remaining indicators or substitute them as instrumental variables. Then you do instrumental variable estimation like two-stage least squares, which gives you the estimates and you can apply normal instrumental variable diagnostics, particularly for the exclusion criterion to just the model. Let's take a look at what this technique does. The first thing when we start analyzing this model with model implied instrumental variables, the first thing that we do is that we throw away indicators B2 and B3, so this is the base case. We could of course simply take a sum of B2, B3 and B1 and use that as a proxy. But we need to have one variable that is the proxy for the dependent variable B and one variable that is the proxy for the independent variable A. And the recommended approach is to take the scaling indicator. So we take the scaling indicator A1, we take the scaling indicator B1, we proxy the latent variables with these indicators. Now we know that in this recursion analysis, as I explained, if we simply regress B1 and A1, the estimates will be inconsistent and biased because of the estimation of the measurement error in A1. So how do we deal with this problem? Well econometricians have figured out that you can actually use instrumental variables to deal with this problem too. The way that we normally learn about instrumental variables is that we learn that they can be applied for omitive variable bias, they can be applied to correct for simultaneous bias, but instrumental variables, importantly, can also be used to deal with measurement error. And it's sometimes referred to as a multiple indicator solution in econometrics because we are using multiple indicators and for one variable, one of them is the proxy and the others serve as instruments. So how does it work? Well it works that we have, we want to estimate the regression between B and A and we know that using A1 and B1 is going to cause the estimates to be biased and inconsistent. We can replace B with B1, no problems because the measurement error of B1 is going to go to the error term because it's uncorrelated. But now the problem here is that if we replace A with A1, we can have, we can present it in two different ways, we will have and the measurement error of A1, if that goes to the error term, we will have an endogeneity problem because the indicator A1 will correlate with its own measurement error. So that leads to an endogeneity problem. However, if we take a closer look at this model, we can see that the measurement error of A1, the E and Q, they are uncorrelated with the indicators A2 and A3. So A2 and A3 are correlated with A1 but only because they share the variance from the latent variable not because of the error term. So we actually have A2 and A3 that are relevant, they are correlated with A1 and they are also excluded because they don't correlate with the error term of A1 and they can be used as instrumental variables. What will then happen is that when we regress A1 and A2 and A3 in the first stage of a two-stage discretization analysis, the errors can't be predicted from these scores, they go to the error term and the fitted value of A1 can be used here in the regression analysis to explain B1 and we will get a consistent estimate of beta. So this is the model implied instrumental variables approach. We find the instrumental variables, we need to find out instrumental variables that are uncorrelated with the measurement errors, typically they will be either from other variables that are correlated or from the same latent variable. Then we apply two-stage least squares GMM or we can use any other instrumental variable estimation technique. Then we do model testing using Sargon test or any other standard tests that we apply after instrumental variable estimation. Here is a recent example of this technique in action, an article by Benjamin Ture and his co-authors and they explained that they used instrumental variables, they said Bollins work, they said Woolwich's work that talks about this as a multiple indicator solution and they mentioned endogeneity, which is misleading because this is not really an endogeneity correction technique, but this is more about an estimating of a model that we assume to be correct that doesn't have endogeneity. So the endogeneity only occurs because we have measurement error in the indicators and that's the only kind of endogeneity that this technique can deal with. Then they show the equations and what's interesting here is that the equation here is actually exponential because this is Poisson regression model and this modeling plot instrument available at least Bollins work and Woolwich's discussion on the multiple indicator solution they only concern the linear case so they don't talk about nonlinear model but I did some simulations after conducting the authors and it seems that this approach actually works with nonlinear models as well although I can't really explain why that's the case but if I were to use this technique in nonlinear models I would probably dig a bit deeper in the literature to find a proof that says that it actually works in that case too. So why would someone use this technique? So there's a bit of selling here the point is probably not it's a bit misleading but there are a couple of genuine advantages in this estimation technique. One is that it is more robust to measurement error with some caveats. This is from Bollins article and they generate data from this kind of model so they have latent variables measured with two indicators each and they have L1 and L2 regression coefficient 0.2 L2 and L3 regression coefficient 0.2 and they have L1 and L3 regression coefficient 0.8 and they leave out this regression path so it's a full mediation model I mean and then they mis-specify the model as full medias then they estimate it with normal maximum likelihood and model implied instrument of values. What we can see here is that in the mis-specified model the factor loading estimates that they get don't vary between the mis-specified model and when the correct model is estimated. So this mis-specification in the latent variable model does not affect factor loading estimates but it does affect the maximum likelihood estimates and this tells us something important when you have a maximum likelihood estimate or any other full information estimate where you estimate everything together you use all information from the sample to estimate the parameters and the mis-specification in one part of the model the omitted one path can cause all the other estimates in the model to be unbiased but if you estimate it piece by piece like you do with this model implied instrument of variable approach you would for example estimate this L2 to R to Z4 path separately then that bias would not happen So the effects of mis-specification are local so this is the key selling point of this technique there are also other important advantages of this technique the second is that it always provides estimates so sometimes maximum likelihood estimation will fail and you don't know why it fails either your software will not converge or it will give you nonsense values or missing standard or something like that it might be difficult to travel through but this will always give some kind of result and because you are estimating one equation at a time in the system it's easy to see where problems are located so it's good for diagnostics it's also easy to understand if you understand instrument of variable estimation and two stage least squares you can simply run this model compared to maximum likelihood estimation if you really want to understand why and how it works you need to understand a bit of the theory behind maximum likelihood or how do you calculate the likelihood a bit of matrices because the models are calculated the matrix form, something about numerical optimization and so on so you need to understand quite a lot to be able to really explain what your software does in this approach you only need to understand what qualifies an instrument of variable and you need to understand two stage least squares both of which are pretty simple to do okay but there are also some disadvantages of this technique I point them out in this email to SCM net email list and the disadvantages are that the robustness against mispecification is is kind of like not it has a caveat so the thing is that model implied instrument of variables are robust for mispecification as long as that mispecification does not make any of your instruments to be invalid so if you have a mispecification that makes any of the variables that you use as instrument of variables invalid then that mispecification will make the model implied instrument of variable estimates very bad but maximum likelihood estimates are affected to a much lesser degree and I use this metaphor of a stake so if you put too much salt in a stake then in maximum likelihood estimation the salt is spread all over the stake so all estimates are biased to some extent but the stake is still edible it's just a bit too salty with model implied instrument of variables you put all the salt in one place and as long as you don't bite that one place then things are going to be fine but if you bite that one place it's completely inedible so you're basically getting robustness but what you trade is the severity of the problem if it hits you so if there is one part of the model that is mispecified most of the time you are okay with model implied instrument of variables but when you are not okay then the results are really bad the second problem with model implied instrument of variables is that it is less efficient so originally in the basic case three indicators of the dependent variable we just throw two away and throwing away information is generally not a good principle but it's useful for diagnostics so this is one tool in the toolbox and it probably should not be your only tool maximum likelihood estimation probably shouldn't be your only tool either but you need to have a toolbox from which you pick what is the right tool for a task some places where this is useful is as I said for diagnostics so this is from Stata's user manual and they explain that one potential cause of problems in SCM estimation is that your starting values can be bad and to understand why starting values could be bad it's useful to understand where the starting values come from well the starting values of Stata and many other SCM software are calculated using instrument of variable techniques very similar to the model implied instrument of variable estimation technique that I just presented so if you see a weird starting value then now you know where it comes from like many other techniques this technique also has a couple of misconceptions and I mentioned already in the two paper researchers seem to think that this technique has some capability to deal with endogeneity it does not so this is a very explicit example this model implied instrument of variable technique is used for testing for omitted variable bias it just can't do that there is no evidence that it would be able to do that and there is no reason that it could but nevertheless these researchers say claim that it could without any evidence the site balance work which is the relevant work to site when you use this technique the balance makes no such claims and well if you take a look at this recent article in journal of management for how to deal with endogeneity they also mention model implied instrument of variable techniques as one potentially useful technique it's not useful technique for dealing with endogeneity it is useful technique for estimating latent variable models but if that latent variable model if there is endogeneity between the latent variables then this technique doesn't do anything finally how do you apply the technique the technique is basically an application of two sets of squares which is fairly basic thing to do the hard part is to identify the instruments fortunately there are two good software packages for finding the instruments Baldry has programmed a mere search or a mere find package for data where you input your model and then it gives you the instrument of variables then you put them into the two-stage list or whatever and then Fisher has written a package for R that does the instrument search and also estimates of model implied instrument of variable estimates and I actually programmed the instrument search algorithm for this software package because the original algorithm was pretty slow and they did it for a paper that created it faster so model implied instrument of variables is certainly worth learning about because it's useful for diagnostics if your model doesn't converge then running it one equation at a time with this model that this approach that always give you results can give you useful information for travel suiting but should this model or this estimation approach completely replace maximum likelihood estimates for most research or most use cases probably not it's another way of the estimating that has different advantages than maximum likelihood estimates but I will still go for ML estimates as a default alternative