 Yksi tärkeintö on, että ei ole endo-genaity. Endo-genaitys on ollut ympäristössä, mutta se on tärkeintö, joka on tärkeintössä paljon eri ympäristöjä ja ympäristöjä on tärkeintö, että autos, jotka on tärkeintössä, on tärkeintö, että endo-genaitys on ympäristössä. Tämä on ollut niin vaikea, että ymmärtää asiaa, koska endo-genäidi ei ole testoja kuten reikaisuudessa, mutta haluat lisää enemmän asioita. Lopuksi katsotaan, mitä asia on kuten endo-genäidi on. Ja olen muista, miten voidaan asiaa tämän asian videon. Ymmärtää genäidi on, että se on usein asia, jotta minulla on esim. In experimental design, the assignment here are, we have a random assignment to treatment and control, then we administer some kind of treatment to one group, the other group doesn't receive a treatment, we measure the outcome variable of interest, then the difference between these two measures, post treatment can be interpreted as a causal effect. So what justifies interpreting these differences causal, it is the assumption that R is exogenous. So the R here, the random assignment doesn't depend on the variable that you're interested in studying. For example, if we test the medicine, then who gets the medicine, who gets the placebo shouldn't depend on the initial health of the people. So it's important that this is randomized independently of what we are studying and that guarantees exogeneity. If R is endogenous, it means that the R depends somehow on the variable that we're studying, for example people's health. Let's say that we have a medicine that has some side effects and we have people who vary how sick they are and we have people who can choose whether they go to the treatment or control. In that scenario, people who are not that sick will choose to go to the control to avoid the side effects and only those people who are really sick choose to go to the treatment group. If that happens, then the assignment to the treatment and control is no longer exogenous, instead it's endogenous because it depends on the health of the people, the characteristics that we're studying. Because R is endogenous, there are initial differences in health between these two groups and then we cannot anymore interpret this difference after the treatment as causal effect. So that's clearly a problem. Another way of understanding enogeneity in multiple regression context is to look at the error chart. So here we have a regression model in a path diagram presentation. So we have the y, the dependent variable, three x, x as the independent variables. We have the intercept and the error term. And the error term here represents all possible causes of y that are not included in the model. So everything that can cause y that is not included in the list of x's here goes to the error term. If the error term on any of these omitted causes correlate with any of the included causes, then we say that this for example x1 here becomes an endogenous explanatory variable. So a variable is correlated with the error term, it's endogenous, and that causes problems. The general condition that one or more of these variables are correlated to the error term is called endogeneity. So that's the problem. Endogeneity, we assume that the error term does not depend or is not correlated with any of the explanatory variables. If it is, overall regression will be inconsistent and biased. So how does endogeneity arise? There are three basic mechanisms that are useful to understand. First simple mechanism is that there is a common cause, let's call it e of x and y, that is not included in the model. For example, if we are studying the effects of CO-gender on profitability and there's a common cause industry so that some industries are more likely to hire women, some industries are more profitable than others, that's a common cause of x and y producing a spurious correlation. If we don't include industry as a control variable, then it's an omitted cause that goes to the error term and it's correlated with x. A more general presentation is that x is simply correlated with some unmodeled causes of y and that could happen for multiple different reasons. I'll provide examples at the end of this video. Then a special case that is sometimes of interest is a simultaneity. So x and y have a reciprocal causal relationship. So x causes y and y causes x. So if x causes y, then the error term of y must include the error term of x because x is the sum of y plus the error term of x and the other way around. So that causes an endogeneity problem. If you have these two way paths, all of these issues we can deal with if we understand the issue and we know where the problem is and we have a bit more data, more variables, but that's going to be covered in later issues. Now it's important to understand what is the problem and then at later on we will talk on how to deal with the problem. Let's take a look at deep houses paper, the market share. So I demonstrated this before in the context of control variables. The idea is that larger firms are more strategically deviant. Yeah, larger firms are more strategically deviant, the positive correlation here and larger firms are less profitable. If we omit the market share from the equation, we will get an omitted variable bias and what will happen now that market share is an omitted cause, it is a cause of ROA not included in the model and it will be included in the error term in the regression. So anything that is supposed to be causing ROA that is not included in the model will be represented by the error term and we know from these empirical results that market share and strategic deviation are correlated. Therefore strategic deviation is now correlated with the error term and strategic deviation becomes endogenous. Of course whether variable really is endogenous or not, we cannot really say it based on the regression results, we need some additional variables called instrumental variables that I'll talk later or we have to argue the no endogeneity assumption or exogeneity based on existing theory. So this leads to omitted variable bias and the effect of strategic deviation will be overestimated by three-fold. Let's take a look at endogeneity problem, another one. So we have our investment in new factories, whether a company decides to invest in new factories or not and we have these investment decisions, we are trying to explain companies return on assets with those are investments. So the question of asking do I have an endogeneity problem begins by asking what does investments in new factories depend on. So why do some companies invest in new factories and others don't? So what causes the various? So what does the investment in new factories depend on? Well probably depends on company strategy. If the company strategy is to grow they will probably invest in new factories and if they don't want to grow they are probably not investing in new factories so that's that simple. Now what is the no endogeneity assumption here? Unless we have this firm strategy as a control variable we are assuming that return on assets otherwise is completely independent of company strategy. So company strategy can influence ROA only through influencing investments. That is of course implausible. Strategy are influences performance in multiple different ways so we have an omitted common cost strategy. Companies investments depend on strategy, ROA depend on strategy partly through investments but also through other means. If we don't control for strategy in this kind of model we will have an endogeneity problem. So this endogeneity problem is explained really well by this editorial by Ketokivian guide in Journal of Operations Management and the problem is that we assume that all other courses are independent of the included courses and that is implausible and the problem is that our estimates will be biased and inconsistent. Then they explain an example where you could reasonably also argue that a causal effect goes to a different direction than what the author said and if the author doesn't really take that into consideration then that's game over for the paper. They also explain that the endogeneity issue must be argued if you cannot do it in prukki you have to argue based on theory. So why do you think that your independent variables for example are investment into new factory is independent of any other courses of ROA. Then you have to figure out what causes ROA differences, company strategy you have to argue that manufacturing plant investments or factory investments are uncorrelated strategy that's an implausible assumption so you have an endogeneity problem.