 Arrallano ja bond approaches are classic approaches for modeling dynamic panels or estimating dynamic panels. This technique is very commonly used in predict management, where it is referred to as the GM estimator. This is a bit misleading because GMM itself does not do anything, it does not deal with endogeneity. The reason why this is referred to as GMM estimation is that Arrallano and bond applied GMM estimation instead of some other estimation techniques. Why they did that will take a look at later on the video. But the important thing here is that it is not the GMM, but it is a particular modeling approach that deals with endogeneity. So whenever I teach students quite often as a student, perhaps from a strategy background, comes and asks me to tell them about GMM estimation. So I made some of the equations, I might have them work with Excel and implement the GMM estimator and then they ask me how does GMM deal with endogeneity. And my answer is it doesn't. It's the modeling approach, not the estimation approach that deals with endogeneity. Therefore, we need to ask two questions. What is the model and why do we use GMM and could we use some other estimation techniques to estimate the same model? Let's take a look at the Arrallano and Bond approach. I used the model that I used before as a starting point, so we have a longitudinal panel, we have a message of ROA, we have a message of CO-gender, we want to estimate the causal effect of CO-gender on ROA, and we want to control for unobsturred thermal law effects. We want to estimate the within effect in this case, and that is what the Arrallano and Bond estimation technique does. So this estimation technique differs from a dynamic panel from cross-locked model in that there is only one dependent variable, whereas in cross-locked model we would have ROA to CO-gender, but otherwise these are very similar models. Well, how does an econometrician deal with this problem? Let's write the equation. So we have an equation instead of ROA and CO-gender, I'm going to be using Y and X for simplicity, and we have the unobsturred effect AI here. We have lag of Y with minus one year as a predictor of Y, we have X lags similarly with minus one year, the lag of X does not really matter for this problem. What we need to be looking at is the effect of Y and how this unobsturred effect AI makes things more complicated. So how does an econometrician deal with this problem? Let's take a first look at a model that is a bit simpler. So we'll take a look at the model without Y as a predictor. So we'll just take a look at X to Y relationship with our fixed effect AI there. How do we deal with this problem? One way of dealing with this problem is to apply first differencing. So we do first differencing, we take, subtract the past value of Y, we subtract the past value of X and this eliminates the unobsturred effect here. So that's a way with the unobsturred effect, we estimate OLS, we're going to be fine, we have consistent estimates. When we have the lag dependent variable as a predictor, things are more complicated. When we look at the equations of the first differencing estimator, then we can see here that we have Y, T minus one, minus Y, T minus two, and we have the error term of Y, T minus one here in the composite error term. And because this Y, T minus one is a part of this Y, T minus one, so this U, T minus one is a part of Y, T minus one, we have an endogeneity problem. So this error term here is correlated with this difference. How do we deal with endogeneity problems? Use GMM is not a solution to endogeneity problem, instead the solution is that we need instrument variables. So what would qualify as an instrument here? Turns out that lag values of Y would qualify as an instrument. For example, we could use Y at T minus two and earlier lags, and Y does Y, T minus two qualifies as an instrument. It qualifies because it's relevant because of the autoregressive path. So Y, T minus two affects Y, T minus one, and also it's a part of the difference by definition. It is excluded because of sequential exogeneity assumption. So the idea of sequential exogeneity is that past values are not correlated with future error terms. So these error terms Y, U, Y, T and U, Y, T minus one are assumed to be uncorrelated with Y, T minus two. So that's an assumption of the estimation approach. So relevance comes from autoregressive paths, exclusion comes from the model assumptions. And this is referred to as difference GMM in the literature that talks about Aral and the Bond Estimates. It's difference GMM because we have a model where we have first differencing. Then we use these are called levels as instruments, and we have multiple instruments we estimate with GMM. We could apply other instrument variable techniques, but for some reason GMM is applied. Let's take a look at the assumptions of this technique and how they're tested. So the assumptions are the sequential exogeneity that I mentioned before. So Y at zero must be uncorrelated with error term at time one and error term at time two and so on. So current values of Y are uncorrelated with future error terms of Y. No correlated error. So the U terms are not autocorrelated. If they are, then this approach breaks apart. There are a couple of tests that are commonly used. So this is an instrumental variable technique. So the Sargan Hansen test that we commonly used for checking for exclusion criteria in other contexts can be applied here as well. Then Aral and Bond develop the test for testing for autocorrelation of the error term. So we assume that the error term is not autocorrelated. If these assumptions fail, typically the failure would be the autocorrelation of the error term, then we can use more distant lags as instruments. So quite often in practice when this technique is applied, we apply it with the first lag as the instruments. We check for exclusion. If exclusion doesn't hold, we increase the lags. And ultimately we will find the sufficient lag that makes the errors roughly uncorrelated. There's also another version of this estimator. It's called the system GMM. And the idea of system GMM is that we can make this estimator more efficient by introducing additional assumptions and introducing additional equations. Let's take a look at what the system GMM does. It's called system because it has two equations that are estimated. It is estimating a system of equations. So we have the model here, the levels model. So this is the model that we want to estimate. And how we estimate it is that we specify two simultaneous equations. So we estimate the beta one from the difference equation. We estimate beta one from the original level equation. But this is endogenous as stated before. So we have endogeneity problem because AI correlates with Y at T minus one. So AI is correlated with every Y, so this is an endogeneity problem. How do we find instruments? Well, the past differences will serve as instruments because first difference eliminates the unobserved effect AI. So we can use earlier differences as instruments for this model and estimate it consistently. Because we have two models that we can use for estimating beta one and beta two, this is more efficient than estimating beta one and beta two from just this difference model. Relevance of this instrument is by definition. So difference and level are correlated by definition. Also because of autocorrelation, autoregression and exclusion comes from sequence of exogenous assumption. And difference removes AI from here. So this is the system GMM. And now the question that we have is why do we use GMM? So this is a general approach. We could use any instrument or variable estimator to estimate this. Why GMM and why not some other estimates? There are basically two reasons that I can think of why GMM is being applied. Both relate to time. So this was introduced in 1991, which at the time of recording is about 30 years ago. And at that point, the computational resources that we had were much less than what we have today. Nowadays, we could estimate this model with maximum likelihood, find a numerical solution to the likelihood function and in seconds. In 1991, this is not feasible. GMM is a lot easier to calculate. We just apply matrix algebra, there's no iteration, no numerical optimizations involved. This is quick to calculate. So that's one reason. Another reason is that GMM at that time happened to be the state of the art of multiple equation estimation in econometrics. So this is GMM is used more for historic reasons than for reasons that relate to the superiority of these approach over other approaches. This technique, Aralano-Bohn technique is not without its problems. These problems are explained, for example, in Alisson's article. And Alisson notes that if the number of cases n, or number of verbs, number of whatever are our observational unit that we observe multiple times over time, if that n is small, then there will be bias. This is also inefficient. So we can get more efficient estimators using maximum likelihood based techniques. We can use multiple lags and which lags we apply, which variables we use as instruments. It is not clear and there is a problem that when we have a complicated model, we have a large number of instruments and increasing the number of instruments increases the bias of instrument variable approaches. There is also another problem with this approach and it is that this is being used as a black box. So the fact that researchers say that they use GMM estimation to deal with the endogeneity problem indicates that not everyone might understand that it is not the GMM, it is the instrument of variables that deal with the endogeneity. So if you think that it is the GMM that does the trick, then you might not really understand what you are doing and it is easy to use this as a black box because of status implementation of the technique XTA bond and you just specify the equation. It gives you estimates, you don't really need to understand what you are doing. And this estimator like others makes some assumptions, if those assumptions are not fulfilled, then the estimates can be very misleading. So there is a more modern way of solving the same problem and it is simply to specify the model using a path diagram or syntax in your statistical software as a structural equation model using the wide format data and then specify the constraints that are required. This has a couple of advantages. The first advantage is that this is more efficient than the RLN on bond approach. So maximum likelihood estimates have proven to be the most efficient possible in large samples. There are other advantages in the maximum likelihood based approach using the SCM software. Another one is easier to understand what is being measured, what is being modeled and what are the assumptions. So here we assume that there are alpha, the unobserved effect is correlated with all the predictors. So we don't make the random effects assumption and we also assume there are sequential exogeneity. So X1 is uncorrelated with the error term of Y2, Y3 and Y4 because they are uncorrelated with all the future values. X3 on the other hand is allowed to be correlated with the error term of Y2 because that is in the past. So we are allowing some correlations with the error terms, we are constraining others. Specifying this as a path diagram makes it more explicit what you are actually assuming. Then we have modern missing data procedures available. So we could actually estimate this model even if we are missing data with the GMM approach it would not be at least as straightforward. We would have to set up multiple imputation and other things which is complicated with structural equation model and we can simply apply full information maximum likelihood which takes the missing data into account automatically. And then finally we have better diagnostics. So you just need to understand one tool and the same chi-square test, the same modification indices, the same covariance residuals can be applied that you apply with these models every time. So you don't need to remember that there is an RLN-boned test for autocorrelation, you can just apply the more general techniques that you already should know. There are disadvantages. Specifying this kind of model is cumbersome if there are a large number of observations. We tend to see wide format data in organized sub-psychology where the time series are rather short. If you collect data with survey then find time points that's quite a lot. If you get data from databases like economists do, studies management researchers often do, then you might get 30 years of data for each company. So specifying this kind of model for 30 years would be a bit complicated and it would be a bit hard for the computer to calculate as well. Fortunately, there are techniques, for example, dynamic structural ecosystem modeling implemented in M+, that takes care of this problem. So that's a special technique for estimating dynamic panel models using maximum likelihood without having to specify this kind of model with these multiple different dependent variables. Then tedious to specify large models. That is true, but fortunately this can be automated. For example, there is a state of package called XDDP ML written by Paul Allison if I remember correctly, and that automates the specification of this kind of dynamic panel model with unobserved effect using status SCM command. So you run this command, it prints you the SCM syntax and then you run the SCM. Then there is the multivariate normality assumption of the maximum likelihood estimation, but as I explained in other videos, this is something that we can deal with. We can use alternative estimation approaches, but even better, we can just use ML because it is consistent and simply apply robust standard errors to deal with the fact that the standard errors from ML and the test statistics may not be trustworthy if the data are severely non-normal. So this is a more modern alternative and in many ways more recommended than the RLN and bond approach. For reasons of history, the RLN and bond approach is still very common while it's dated.