 Koska alueella ei ole aina yksinkertaisuista, sillä me otamme yksinkertaisuista kontrolua. Tämä on meidän eri strategiasta, jossa on kohdattu. Sitäni yksinkertaisuista kontrolua on ottaa uudellisesti yksinkertaisuista kontrolusta. Nyt, kun yksinkertaisuista on yksinkertaisuista, otamme yksinkertaisuista kontrolusta. For example, we could say that this difference between men and women like companies, this is just arbitrary values here. This difference here is not because of the gender differences. Instead we could claim that it's an industry, a company size difference. So that there is actually this overlap between gender and performance here, the correlation, is partially caused by gender, but it's partially also because of smaller companies are more likely to hire women as CEOs and smaller companies are more profitable. So we say that this relationship between gender and performance is at least partially explained by size being a factor in CEO decisions and size being a factor in influencing performance. So how do we take that kind of a control variable into account? There are a couple of different ways. One intuitive way is an instance of a general strategy called matching, so we try to make the samples more comparable. So let's assume that there are only a few women like companies with more than 250 people and most women like companies have 250 people or less. We could make the samples more comparable by dropping large companies, so we only focus on medium sized companies with 250 people or less. And that would be a more fair comparison and if size actually is a factor that influences both gender, the CEO selection and performance, then this kind of more comparable samples should give us a smaller performance difference. Which they do in this case. So we can see that when we make the samples more comparable, the difference is 1.4 instead of 4.7. Matching is an intuitive way of understanding statistical controlling, but it's not a practical strategy for a couple of reasons. First of all, when you have multiple different things that you want to control for, then constructing this kind of matched sample in this kind of simple strategy, it's not a viable option anymore. Because you cannot have exactly the same companies in both samples. So once there are the factors to be controlled increases, then it's not possible to construct two samples that are comparable on all those factors. So to take that into consideration, we don't normally apply matching, instead we apply a statistical model. So we say that the return on assets depends on CO gender and company size, so that we can express return on asset as a linear function of CO gender and size. We multiply CO gender, female is one, male is zero, and company size we multiply that with another variable, beta two. And then we ask the computer to give us some estimates for these beta one, beta zero, beta one, beta two, so that we can predict return on assets as well as possible. And computer will do that for us, then we interpret the results to see whether the gender effect actually exists. Either way, regardless of how we actually implement this statistical controlling, we need to decide on which factors we need to control for. And the factors that we control for are called control variables. So control variables are present in nearly every study in business research. It's quite often that you actually see a section in the paper that is explicitly labeled as control variables, like here in the Heckmann's paper that we use as an example. So control variables are alternative explanations or alternative theories for the data. If we say that the women's companies are more profitable than men's companies, we have to think really hard why is that the case. Then we have all kinds of reasons, plausible reasons that we could come up with, the size effect, industry effect, selection effect. And then we include those into the same model. So we say that we have independent variable that we assume to influence the dependent variable and we also have control variables in the same model. And we kind of like put these variables together to compete against one another to see which one of them actually explains the dependent variable return on assets in this case. So it's important that the control variables are selected based on theory instead of just throwing in a standard set of gender and age with your people or industry and revenue if we have companies. So you need to choose them carefully to rule out alternative explanations. And it's important that you justify why you think that the control variable is related to both your independent variable and the dependent variable. One common thing that I see in articles in which I complain as a reader is that authors generally only justify the relationship between the control and the dependent variable. But it's almost as important that you justify why you think that the control and the interesting independent variable, CO gender in this case, are correlated. Let's take a look at the next example. So we have the article by Deep House and they have a variable called market share. So it's market share and interesting, a good control variable based on this correlation matrix. We have to first to understand whether it's a good control variable empirically, we have to look at certain correlations. So a market share is a relevant control variable if it's correlated with the key independent variable. And we are looking at the effects of strategy deviation, variable number four on relative return on assets, variable number one. So we need to take a look at the correlations of market share with variable one and variable four. So we have here market share is weakly and negatively correlated with return on assets. And it's very strongly correlated with strategy deviation. That would suggest that we can't infer whether there is or is not a causal relationship based on a correlation. But this strong correlation raises the question that if market share has an effect on return on assets, then because it's correlated with the strategy deviation variable, it could create a superior correlation. So market share is relevant control if we have theoretical reasons that return on assets depends on market share. Let's take a look at the actual modeling results. So let's say this is based on Deep House's paper, so they say that market share has a negative effect on return on assets, so that when your market share goes up, return on assets goes down. And it's compared to the other effects in the paper, in the article. This is an okay, a fairly large effect. The effect of strategy deviation is zero point minus zero point zero two, so it's smaller. You can't compare directly, but we'll do that for convenience now. And they are highly correlated. So what will happen, what is the interpretation of this figure? The interpretation is that larger firms, firms with more market share are more strategically deviant according to their definition. Larger firms are also less profitable and these two relationships cause a spurious relationship. If larger firms are more deviant and larger firms have a smaller ROA, it means that if this effect was not controlled for, then we would get a fairly different estimate for strategic deviation. If we don't control for market share, then this effect here will be inflated because it confounds the effect of market share and strategic deviation. So let's assume we leave market share out, then our estimate of strategy deviation would be the actual direct effect of strategy deviation and also the effect of size because size is correlated with deviation. So the effect would be zero point zero minus zero point zero fifty eight or three times as large as before. So omitting the important control variable would have a serious consequence for the modeling results. And in this case, it would result in omitted variable bias, which makes the estimate three times as large as it otherwise would be, assuming that the model is otherwise correctly specific. So dealing with controls because the controls are so important for your causal claims, you should take it very seriously which variables you include and really think what kind of alternative explanations there are for the observed association or the association that you expect to observe. Statistical controls and experimental approaches can be compared. So in the experiments you have treatment and control groups that you assign yourself and you apply treatment. So you have full control on the study and the groups are perfectly comparable to start with because of randomization. And if after treatment there is a difference between groups, then we can make a claim that the difference is because of the treatment. So that's fairly simple. In statistical controls, we don't have control over the cases. So we are just passive observers of what happens and the only way we can rule out alternative explanations is to think based on existing theory what kind of other plausible explanations there is for an association and then we rule them out using counter variables in our analysis.