 Normal regression analysis can be used to estimate models with multi-level data. While normal regression analysis is not always the idea technique for doing so, there are a couple of simple strategies that can be applied to estimate models with such data. These techniques provide a starting point for understanding more complex analysis techniques. Let's take a look at how OLS regression can be applied to estimate the within effect for multi-level data. Our example data is these 15 companies observed over 10 years that we have looked at in a previous video. We can see that the within effect and the between effect are not the same. These companies that are invested only a little in R&D are less profitable than these companies that are investing heavily in R&D. Nevertheless, the within effect in a company is negative. When a company increases their R&D investment such as this company here, then profitability will go down. The between effect is positive and the within effect is negative. We want to understand how to estimate the within effect from this data. We want to take these two effects apart and estimate the within effect. The within effect would be important for example for informed policy on a firm level. Should a firm increase or decrease their R&D investments if they care about their profitability? That is a question that the within effect would answer in this case. We could of course estimate the separate regression model for each company. We have 15 companies that split the data to 15 sub-samples and run a regression analysis on each company, which are done here and so the lines graphically. The problem is that then we only have 10 observations for each company, which is a very small number, and also we get 15 different regression coefficients. Typically, we just want to report one. How do we get the within effect? There are two very easy strategies for doing so. The first strategy is to use dummy variables. The idea of a dummy variable is that if we have 15 companies, then we create 15 variables, and those variables indicate which company that observation is for. We have originally the variable firm, which receives 15 different values, and then we create 15 new variables from one to from 15, and the first firm one variable receives a value of one for the first firm, and value of zero for the other firms. The second dummy variable receives the value of one for the second case and zero otherwise. So these dummy variables indicate to which firm that observation belongs to. A dummy variable is defined in a way that just one variable at a time for an observation receives one, and all the others are zeros here. This indicates that this observation belongs to firm one and not any other firms, so all these are zeros. How do we apply this in a regression analysis and what's the outcome? When we add all the dummies in a regression model, then typically your regression software will drop one from the model. So here, firm one has been omitted. The reason is that the dummy variables, if you add all of those dummies in the model, they will be perfectly collinear with the intercept. So in practice, we omit typically the first dummy, so we only have firm two to firm 14 dummies, and then firm one is a reference category. In this case, there are profitability of firm one when R&D zero is given by the intercept, and then firm two dummy gives the average difference between firm one and firm two when R&D investments are held constant. So these dummies don't indicate any absolute levels, but they indicate the difference between the focal firm, firm two, for example, with the reference category, firm one. Quite often, we wouldn't interpret these dummies because there are quite a few of them, and typically we are not interested in specific cases. We are interested in how the regression line goes, controlling for the fact that we have our data from multiple different companies. So this is the first strategy. We estimate the dummy variable, so we basically allow each company to have a specific intercept that is estimated from the data, and then these companies have regression lines with the same slope. So each company basically receives the same regression line except that the intercept can be different. So that is one easy strategy. We model the differences between these companies. The second strategy is within firm centering, and in this strategy, we don't model the constant differences or stable differences between companies. Instead, we eliminate the differences between the firms or companies before the actual regression analysis. So what we do is that we take the R&D, the explanatory variable and profitability, the dependent variable, and we calculate the cluster mean of both of these variables. So we have R&DM, which stands for R&D mean, and we calculate the mean R&D for the first company. It's 18 percent. Then we calculate the mean R&D for the second company, and it's 6.4 percent and so on. Then we do the same. We center the R&D by subtracting the cluster mean from the original value. This centered R&D, R&D period C, is how much that observation differs from the mean value of the company. So all these R&D Cs sum zero within a company. We do the same for the profitability. So we have the mean profitability and then the mean centered profitability. And this eliminates any systematic differences between companies because after the within firm centering, all variables have means of zeros within a firm. So the within firm differences disappear from the data. Then we run a regression analysis, and we just use the mean centered dependent variable, the mean centered independent variable, and we get the same regression estimate as before, which is the within effect. So the regression analysis where all between effects and all contextual effects have been eliminated from the data, what remains is the within effect, which is estimated. Let's compare the three models. First we have a model that ignores clustering. We just run a normal regression analysis of profitability on R&D. Then we have the dummy variable model, and then we have the within firm centering model. You can see that the coefficients here for the dummy variable model and for the centering model are the same. So it's minus 0.418, and this is the within effect. So both of these techniques produce the exact same estimate, and that is the estimate of the within effect. Then if we ignore clustering, we get the population average effect. So the population average effect just gives us an ignoring clustering, and it's very difficult to give any causal interpretations to that effect. The within effect has a causal interpretation in how much can we expect the profitability of one firm to increase if that firm increases the R&D investments by one unit. But there are some interesting features when we compare the dummy variable model, the within firm, and particularly the within firm centering model. The first is that the R-square values are quite different. So for the first model it's 31%. The second model is 70%, and the third model is 20%. So why is that such a large difference? Well this R-square here is kind of like it quantifies how much the within effect and between effect together explain the data in sort of a way. It doesn't really quantify that precisely because if the within effect and between effect are not the same, then estimating two different effects will give you a higher R-square. But it's roughly, so how much R&D generally explains profitability. Then we have the 70% variation here in the dummy variables. So what is this 70% R-square? And it quantifies how much the unobserved heterogeneity term, how much the contextual effect and how much the within effect together explain the data. So if we eliminate all those three sources of variation in the data, there is still 30% of the variation that is unexplained. Then the within firm centering gives us 20% R-square, and this is roughly how much R&D explains the within firm variation. So if we want to understand how much R&D investment is the variation of an individual company's performance, then this R-square of 20% would answer that question. So which one should you report? It's something that you should really understand why these are different, but if you don't know which one you should report, typically this within firm centering R-square is something that is most useful because it has a clear interpretation of R-square of a particular effect. How much R&D influences variation of company performance within that firm, whereas dummy variable and ignore clustering R-squares combine explanation on at least two different levels. Then there's another interesting feature. It's that while these estimates from the dummy variable model and within firm centering are exactly the same, the standard errors are not the same. So what does that mean? Standard error quantifies how much we expect the coefficient to vary if we repeat the same analysis over and over from repeated samples of the same population. The dummy variable model and the within centering model have been proven to produce the same result. So the real variation from one sample to another should be exactly or is exactly the same. So how come standard errors are different? And if the variation of this dummy variable coefficient and this within firm coefficient is actually the same, then one of these standard errors must be incorrect because they quantify both the same variation in the hypothetical scenario of repeated analysis. It turns out that this within firm centering standard error is actually biased and inconsistent. So this underestimates the variability of the regression coefficient. The reason is that when we within firm centre we also take out some variation of the error term and the variation of the error term is used to estimate the standard error. So the within firm centering strategy should actually never be applied in practice to the dependent variable because the standard errors will be inconsistent. If you do so, you have to apply a correction. There are analysis techniques such as generalized least squares that do this kind of centering but those techniques also apply the correction to the standard errors. So if you want to centre the dependent variable you should always do so by using one of the canned procedures of your statistical software. So these are two simple strategies well there is a third simple strategy run a separate regression analysis for each company but then that run is the problem that you have a large number of models and with very small sample sizes each and how would you aggregate the results for interpretation. So this is typically not something that people would consider. The dummy variable regression is actually a useful technique if you have a small number of cases the problem with that is that our score is difficult to interpret and the centering technique is something that you should not use at least you should never centre the dependent variable. So how should you actually model this data? The dummy variable is okay but there are also other techniques. So the more advanced techniques for multi-level modelling and these are actually more commonly used techniques for multi-level data than the normal regression analysis can be categorized based on one assumption. So if you can assume that there are no contextual effects of the variables of interest econometricians say that the random effects assumption holds I have another video about that assumption then you can apply some of these techniques you can apply generalized least squares random effects estimation maximum likelihood estimation of random internet models or you can apply generalized estimation equation technique or you can apply normal regression analysis with cluster robust standard errors if you cannot assume that the contextual effects are zero if you know or you have an idea that they may be non-zero then you can use our generalized least squares 60 effects regression analysis or alternatively you can use any of these analysis techniques and then apply use cluster means of the interesting variables as controls so recall that cluster means were the means of the variables within clusters that you calculate when you do the cluster means entering procedure