 Multi-level data provide interesting research opportunities, but they can also be challenging for data analysis. To get started with multi-level data, let's start with an example. We start with a simple question, is profitability related to R&D investments? And we have a data set of 150 observations of two variables. We run a regression analysis and we can see that R&D here is clearly positively related to profitability. So the question is, is that really so? And just based on this figure it would seem that there is a positive relationship, but if the data are actually multi-level, the answer may not be as clear. So what if I say that these data, these 150 data points, are actually three industries with 50 firms each? Would that change the conclusion of the direction or existence of the effect? It could because the data could actually look like this. So it is possible that this red industry here is not very profitable, doesn't spend much on R&D, and this blue industry here is very R&D heavy and also very profitable, but within an industry the effect is actually negative. The more you spend on R&D, the less profitable you are. So it's possible that on one level the effect is positive, on another level it's negative. And now the question is that if we want to know what is the direction and magnitude of the effect, we have to specify which level are we interested in, because the answer is different depending on whether we want to study how firms perform within industries or whether we want to study how industries differ from one another. So we have these different levels, firms exist within industries, so industry is a larger level that's a level two variable or level two unit and firm is level one unit within the level two units of the industries. And clearly which effect we report, the positive effect from the previous slide or these negative effects depends on what is the purpose of our study. Let's take a look at another example. Is profitability related to R&D investment? Our data are here. We have again some number of observations and the trend is really clearly positive. So R&D and profitability are positively correlated. What if these are actually repeated observations of the same set of firms? So what if we have 15 companies over 10 years, could be the same thing here. So within a company the effect is negative. So if the same company increases their R&D spending, their profitability will go down. But there are these between company differences that nevertheless cause the overall regression line to be positive. Again if we want to answer the question, is there a positive or a negative effect, we want to know which level we are interested in. Are we interested in answering the question, what makes firms different or are we interested in answering the question, what a firm can do to increase their profitability or can a firm increase their profitability by increasing the R&D spending. And normally at least in strategic management we are much more focused on the within firm level, what a firm can do to improve themselves. From this example it's clear that just running a regression on this data and reporting that effect as if it was the within effect would lead us to an incorrect conclusion. There are two different fallacies related to this example. One is that we have the ecological fallacy. So if we try to generalize from the between firm effects here, so it's clearly positive to the within firm effects. If the effects are not the same we are committing ecological fallacy. The opposite is atomistic fallacy. So the idea in atomistic fallacy is that we are generalizing from this within company trends to between company differences. We could for example say that because investing more in R&D causes you to be more profitable than all companies that invest more in R&D are on average more profitable than those that do not. That kind of inference is not necessarily valid because it could be that the between effect and the within effect are not the same. So what's the consequence? Consequences of cluster data, there are two things that you need to understand. One is that clustering can be a nuisance. So if you want to estimate the within effect, how R&D investment within a firm influences that firm's future performance then the existence of these between company effects would be a problem for you because you can't just run normal regression on your data and get the correct effect. Another one is that you're violating the independence of observations assumptions in regression analysis but this is a much more trivial problem because you can just apply cluster robust standard errors and that will deal with the problem in the regression context. It's a lot more challenging to get the effect right than to deal with this issue of non-independence of observations which mostly affects standard errors. But also clustering presents interesting opportunities. So your interest could lay on multiple levels so you could study how much context matters. So how much does belonging to a particular industry affect company performance and how much those decisions that the company can make over time influence performance. So on which level does performance vary and what is under the firm's control. Then we can study effects that vary between units. So we could study for example whether R&D investment's effect and profitability is stronger for some companies than others. We can also study what explains the difference in magnitude in those effects. So for example, does being in high tech industry moderate the effect of R&D investment on profitability and that would be a cross level interactions. So we can study on which level things vary. So does it matter that company makes some decisions or is the outcome mainly determined by the context which is beyond the control of the company most of the time and the magnitude of the effect how it varies between different companies.