 Generalised Method of Moments is a popular technique for estimating certain kinds of panel models. Some researchers seem to believe that this technique has some superpowers for dealing with endogeneity, but it is just an estimation technique like any others, and it can be applied to also context other than panel studies. Let's take a look at what this technique actually does and under which kind of scenarios it might be useful. Generalised Method of Moments is used for estimating this kind of models. We have exogenous variables that we use as instruments X1 and X3, and then we have endogenous variables that may have correlated errors, and also there may be a similarity between these or two-way relationships between these endogenous variables. To understand what Generalised Method of Moments does, we need to first understand what the Method of Moments does, and then to do that we need to understand what is a moment in statistics. So Wikipedia is pretty useful here, and if you want to know more, there are some great videos on YouTube, for example, I like particular this video here. The idea of a moment is that it is basically a distance, and the first moment is mean, and mean is basically the average distance from zero, and the second moment of a variable is the average square distance from the mean, so that's the variance. And so on, so we have skewness, we have kurtosis, and then we have other quantities that describe distributions. But for Method of Moments estimation, we need to understand that moments basically refer to variances and covariances in this context. So we have means, variances and covariances, and that's what we need to understand about moments. If you want to know more, then go to YouTube, there is lots of great material that you can use to learn. So how does this technique actually work? So what's the idea behind this equation here, and how does it allow us to estimate this model here? Let's take a look at the equation a bit closer. So the equation here has a couple of things. We need to understand what the z is, we need to understand what this y minus x beta is, and what's the expectation. So z are the instrument variables. In our system model here, instruments are the x's, and then y minus x beta is simply the error term. So we have the actual absolute value minus the fitted value of the population, and the difference is the error term. So that's the error term, and then this full expectation, we multiply two things together and we take the expectation, that's roughly the covariance. So that is the covariance between instrument variables and the error term, and we set that to zero. So that's an assumption. Assumption in this technique is that the error terms here are uncorrelated with the instrument. So that generally does something with any instrument variable technique. So how does that allow us to estimate the model? Well, the estimation criterion for method of moments is that we find the betas, so that the covariance is between the instruments and all residuals are zero. So every residual is uncorrelated with every instrument, and we could do mathematical optimization to find the betas and calculate residuals and actual calculatis correlations. But in practice, there is a closed form solution, an equation that gives us the betas that satisfy these constraints. But this method of moments has one disadvantage. It is that if the model is over identified, then a solution is unlikely to exist. And the reason being that we have more constraints than what we need for estimation, and it is unlikely that redundant constraints are exactly the same in any small sample. So to address that, we have generalized method of moments. And generalized method of moments look similar, so we have the method of moments kind of here. So we multiply instrument with the residual, but we have also this W here. So what's the meaning of this equation? And the meaning of these equations is roughly that we minimize the weighted sum or squares of these covariances between instruments and residuals. And how the weights are defined, it's not relevant at this point. But basically we try to get every instrument to be uncorrelated with every residual. We cannot do that generally for over identified models in small samples. So there's always some sampling variation that causes those correlations to be non-zero. So we try to get as close to zero correlations as possible, and then we weight each correlation a bit differently depending on their importance. Another equation that actually gives the estimates is here. Understanding what this equation actually does is not useful, but understanding that we have an equation, and have a closed form solution for this estimation technique is useful. If we compare generalized method of moments against another modern method, the maximum likelihood, in maximum likelihood we have to iteratively find, using numerical optimization, we have to find a maximum of a likelihood function. And that can go wrong. So there are things that can go wrong in numerical optimization. Here we don't need optimization, we just calculate our estimate using that equation, and this can pretty much always be calculated. So with GMM you're guaranteed to get results, whereas if you use maximum likelihood estimation, and your model and data are weird, then you may not get any estimates. And then the optimal choice of W depends on OLS residuals, or 2s least cross residuals. So it has been shown that there is an optimal choice for the W, and we call that choice the optimal GMM. Most of the time when people use the term GMM, they refer to the optimal version instead of some suboptimal choice of W. This generalized method of moments has been shown to be a more general case of 2 states least squares and 3 states least squares. So if we choose W appropriately, then we get 2 states least squares estimates, or 3 states least squares estimates. So these basically are, in a way, obsolete both techniques. However, using 2 states least squares is a lot simpler than using this technique, so maybe that has some use. We'll get to that a bit later in this video. The optimal GMM is more efficient than 2 states least squares and 3 states least squares. So one might ask why are we using 3 states least squares at all, and why do econometrics books talk about this technique. The reason is that this was introduced before GMM. And because it has been around, people use it and therefore you need to understand what it does to be able to read what others have done. There is also another statistical reason, which is that in small samples, 3 states least squares may actually be more efficient than GMM, but we really cannot say anything general about it. It has to be established for each scenario separately, which is of course not feasible for a normal researcher. So this consideration gives us a rule of thumb that when you consider estimating a full system using either of these techniques, then always use GMM. So GMM is better than 2 states least squares and 3 states least squares if you know that the system is correctly specified. However, we just identified models if your decrease of freedom is zero, then all these produce the exact same result, in which case you should use the simplest one, which is the 2 states least squares. But importantly, using these rules of thumb, you would never need to use 3 states least squares. But there is something to 2 states least squares that makes people recommend its use. And also Woolridge, who I borrow here heavily, says that 2 states least squares is actually a pretty good thing to have in your toolbox. The reason is that this is a robust technique. So the model does not need to be correctly specified for all parts because you can estimate it equation at a time using 2 states least squares. And then one small mispecification in one part of the model only affects that part of the model instead of making all estimates inconsistent. So 2 states least squares is a good technique to know for diagnostic purposes and if you want to go and study a small part of your larger model.