 Unobsorted heterogeneity is a term commonly used in econometric texts, particularly texts that deal with panel data. What is unobsorted heterogeneity and how do we deal with unobsorted heterogeneity in our analysis? To understand what is unobsorted heterogeneity, we have to first understand what does the term heterogeneity mean in the context of multi-level modeling or panel data analysis. Heterogeneity means that, for example, firms in different industries differ from one another systematically, or people in two different teams differ from one another systematically. So people or companies are not homogeneous, but they are heterogeneous and that heterogeneity depends on the higher level unit industries or teams. We can also understand heterogeneity through non-independence of observations. For example, there is heterogeneity if firms that operate in the same industry are more similar to one another than firms that operate in two different industries. There is team-level heterogeneity if people who are in the same team are more similar to one another than people who work in two different teams. This heterogeneity is unobserved if we don't observe its cause. So unobsorted heterogeneity are any systematic differences between groups or clusters that we don't observe causes of. What is the impact of unobsorted heterogeneity on regression analysis? Well, first of all, we are in violation of independence of observations. The reason for this is that because heterogeneity is unobserved, it is something that must be accounted for by the error term, because all the unobserved influences go to the error term. Unobsert heterogeneity makes the error terms non-independent, so that errors in one cluster are more similar to one another than errors between two clusters. We can also be in violation of the no endogeneity assumption and this would be the case if the unobserved term, the source of unobsert heterogeneity is correlated with any of the predicures. In this example here, we have three industries, each with a few dozen companies and we are in violation of non-independence of observations because there is clearly clustering one industries here, the second one is here and the third one is here and there is also a violation of the no endogeneity assumption because if we draw these regression lines, we can see that the one interest of this here, green one is about here and the blue one is all the way up here. And as the interest increases, we can see that the mean of R&D increases as well, so R&D is correlated with the unobserved effect. So we are in violation of non-independence of observations and the no endogeneity assumptions. Let's take a look at how we model unobsert heterogeneity. Let's start with the normal regression analysis. So this is the normal regression model where the fixed part contains the regression coefficients and the data, the variables and then we have the error term and we make the assumption that the error term is independent and identically distributed. We also make the assumption that this error term is uncorrelated with any of the predictor variables. We can extend this basic model to an unobserved effect model. One way to do so is to add a term for the unobserved effect, we call it AJ. So we have the unobserved effect here and now the question is what kind of assumptions we make. We still make the assumption that the U here is uncorrelated with all the predictors and depending on what kind of modeling approach we make, we can either make the assumption that this unobserved term is correlated with X variables or we can make an assumption that it is not and then based on that assumption we can choose different analysis techniques. Unobserved effect can thus cause two problems. If it's just non-independence of observations then our standard errors will be biased. If we have indogeneity so that the AJ correlates with the predictors unless that is explicitly modeled, we have also bias and inconsistency. So how do we model the unobserved effect? We have three main modeling strategies. The first strategy is that we ignore the unobserved effect and we use cluster robust standard errors. This will take care of the problem of non-independence of observations, but it will of course do nothing for the problem that is caused when this AJ, the unobserved effect, is correlated with the observed predictors. We still have inconsistency and bias. Another thing that we can do is that we can eliminate the unobserved effect from the data and this will typically take care of both the violation of the non-indentity as Samsung and the non-independence of observations and it's an easy to apply strategy. The third strategy is that we build a model that explicitly models this unobserved effect. For example in multi-level modeling we do that, we use a latent variable for this AJ, because it's not observed and then we model how that latent variable relates to the other variables. So these are the three main modeling strategies for dealing with unobserved heterogeneity and there are of course various specific techniques under these three general strategies.