 Econometriiksi, teksti on panel data analysis- ja multilomomodellingin emfasaisi importanssi random effects assumption. In this video, I will explain what this assumption is about and how it can be tested and justified. Let's start by looking at the normal regression model. So we have the Y, the dependent variable, we have the X is the predictors, we have the beta's regression coefficients and then we have the U, the error term, which importantly is assumed to be uncorrelated with all the predictors. If the U correlates with any of the predictors, then we run into endogeneity problem where all of these regression coefficients will be biased or inconsistent, at least potentially all. When we move from this basic regression model to the multilevel regression model or panel data regression model, we add more components to the error term. So we add at minimum this random interest of Uj here and that is an unobserved random effect. So that Uj here tells how much clusters vary from one another. For example, how much individuals that are observed over time vary from other individuals that are observed over the same time points. In the multilevel modeling context, this assumption that the error term is uncorrelated with the predictors applies to both Uj and Uij and failure of this assumption leads to endogeneity problem which causes all these coefficients of X to be potentially biased and inconsistent. Whereas in normal regression model this error term, the assumption that the error term is uncorrelated with the excess can be tested directly in multilevel context. There actually are tests for this Uj term here, whether it's correlated with excess or not. But before we go into the test, let's go through three different ways of understanding what it means that Uj is uncorrelated with excess. There are three different ways of understanding the random effect assumption. One is using the equation that I just showed that all Uj terms are uncorrelated with all predictors. So the Uj is uncorrelated with all predictors and that's an assumption. Because the Uj itself is not observed, we cannot calculate its correlation with the predictors. Therefore, this way of presenting the random effect assumption does not give us anything that would be directly testable. And it's also trying to understand what the random effect assumption means is kind of difficult because the Uj is fairly abstract. It's all the differences between individuals that are not accounted by the market. The second way to understand the random effect assumption and this is the foundation for some of the statistical analysis that we can apply when the random effect assumption holds is that the within effect and the between effect are the same. So the within effect is calculated, for example, by taking a subtracting cluster mean of y from y, subtracting cluster mean of x from x and then running a regression on this cluster mean center data that gives the within effect. Then the between effect is regression on cluster mean, so cluster mean of y, cluster mean of x and cluster mean of other x's. The random effect assumption in this context is that all the between effects and all the within effects are the same. And this can actually be tested so we can do, for example, post estimation wild test while running this as a system of equation and test if these are two coefficients are equal. But that's somewhat challenging to do because you have to estimate a system of equations. Then there's a third way, which I think is the superior way of understanding the random effects assumption because it allows also to understand it from more of a theoretical perspective. The third way of understanding the random effects assumption is that there are no contextual effects. So a variable only has some within effect and not contextual effects. For example, if a person's intelligence influences how a person behaves in a team, then a person's intelligence doesn't influence how others behave or how intelligence others are doesn't influence the behavior of the focal individual. So the variable only has a within effect. That's the third way of understanding the random effects assumption. And this is something that you could argue from theory. For example, you could say that when we think about team members and the effect of gender, gender affects individual performance but the gender distribution of balance within a team doesn't affect the individual. So we would say that gender has only a within effect but no contextual effects. And this can be tested. You specify regression model where you have the original predictors and then you have the cluster means of the predictors and then the assumption is that all these regression coefficients for the cluster means are zeros and that can be tested. It can be tested for individual variable using the normal t-test or z-test if you apply maximum likelihood estimation and it can be also tested for all these using a bold test. So that's something testable. This is not testable, these two are testable. And let's take a look at the actual tests. So this is from a paper that I've written with John Antonakis and Nikolai Bastardos. We have three different ways. There are the traditional way of testing random effects assumption that econometrics books explain is the Hausmann test. I have another video about the general Hausmann test but the idea briefly is that you run two different estimators on the same data. So we could have, for example, a model that makes the random effects assumption and a model that doesn't make the random effects assumption, let's call it fixed effects estimator or correlated random effects estimator in this case. And then we compare. Is the model, are the estimates similar enough so that we can conclude that the estimator that is more efficient but makes more assumptions is also consistent. Then we have likelihood ratio test. The idea of likelihood ratio test is that we estimate two different models. We have a model with the cluster means including the data in the model and a model without the cluster means then we do a likelihood ratio test where we compare those two models to see if the cluster means explain the data. If the test is non-significant then we conclude that the cluster means don't explain the data which implies that there is no contextual effect in the model. Then we have the Wald test or F test which can be considered as a special case of Wald test and this is basically you run one model with the predictors and their cluster means and then you test at the same time if all the coefficients of the cluster means are all zeros. This is fairly simple to calculate and also it can be calculated using robust standard errors which for example the Hausmann test cannot. So this is my favorite test of these. So let's take a summary of these things that I just explained. The random effects assumption is important because if this assumption fails and it's made but it fails then it leads to inconsistent and biased estimates. It can be understood in three different ways. One is that the unobserved term that we don't really observe is uncorrelated with the predictors. This is kind of an abstract way and maybe not so easy to understand. The second way to understand the random effects assumption is that the within effect and the between effect are the same. The third way is to understand that there are no contextual effects in the data and this third way is something that you could actually argue from theory that for example intelligence only affects individuals performance but not anyone around that individual. Then we have three different tests. We have the Hausmann test which is the classic way in econometrics for testing is an assumption. We have likelihood ratio test that can be applied to compare maximum likelihood estimates so you estimate the model with cluster means without cluster means and then we have the F or world test that you estimate that you calculate from a model that contains the predictors and their cluster means and you check if all the cluster means have coefficient of zero at the same time. Finally the random effects assumption if made should be justified based on theory so why do you think that individual characteristics don't affect others in the context or why do you think that others in the context don't affect an individual. This is important because if you run an estimation technique that makes the random effects assumption and you are in violation of that assumption then you have an endogeneity problem which means that potentially all of your estimated coefficients will be inconsistent and biased and basically simply wrong.