 next I will explain the idea of regression analysis. Regression analysis is one of the most commonly used analysis tools in quantitative research and most applications of quantitative techniques can be thought of as special cases or extensions of this particular analysis. The regression analysis results are typically presented as a table like this. Here we have four different regression models. We have different regression coefficients, we have different model indices Täällä on tietysti ajattelua, että haluaisin ymmärtää, ja myös haluaisin ymmärtää, mitä sisällä katsotaan. Ja olemme nähneet tällaiset tavoitteet näin kaupungin videon, mutta ensimmäinen kysy, mitä regresson analysointi on oikeastaan. In a regresson analysointi olemme kaksi varioita. Meillä on yksi varioita, joka haluamme ymmärtää. For example, companies profitability, ROA could be a dependent variable. Then we have multiple independent variables. The independent variables are variables that we use to explain the dependent variable. For example, we could have CO gender and company size and company industry. Then regresson analysis answers the question, how much do these variables together explain the variation of the dependent variable, and which ones of the variables are the most important ones for explaining that. So regresson analysis allows us to control for alternative explanations for an observed correlation. In the case of the paper by Heckman from which the previous table was from, the idea they explained patient satisfaction scores with, for example, physics and productivity, physics and quality and physics and accessibility. So you have one thing that you explain with multiple things to see which one of those multiple potential explanatory variables actually matters. Idea of regresson analysis is commonly presented as a Venn diagram. Venn diagram is useful for illustrating some properties of regresson analysis, but it doesn't illustrate all the properties, but it's a good starting point nevertheless. So the idea of these circles here is that this circle here presents the variation of company performance or return on asset. So this is the variation of the dependent variable. This is the variation of the independent variable that we're interested in, which in this case is the CO-gender, and this is the variation in company size. Now we are interested in how much of this co-variation or correlation between gender and performance is actually due to gender, and how much is due to the effects of size because size and gender are correlated. So we could say that the correlation between gender and performance is partly due to presumed causal influence of CO-gender on performance, and partly because smaller companies tend to be more profitable, this correlation here, and also because smaller companies tend to be more likely to hire women CEOs, which is this correlation here. Now we want to use regresson analysis to parcel out this part that is shared by gender and size and performance to get the unique effect of performance. So we could think of regresson analysis as doing something like this. So it eliminates the effect of company size on the relationship between gender and performance. Of course we are not limited to just two independent variables. We can have multiple competing explanations for the dependent variable in the model. Typically we would have in the ballpark of 10 or 20 variables. So we can take additional bytes away to get a cleaner estimate of this correlation between gender and performance that is free of any third causes. Ultimately we would get a clean causal effect between gender and performance if we have included all relevant controls to the model. That of course is easier said than done. Regresson analysis is a statistical model and a model is an equation. So whenever you hear the term model it means that there is some math and model can also be presented as a path diagram like this. I will first talk about the path diagram. So the path diagram here has one independent variable y, three independent variables x and the x's are independent. They are allowed to be freely correlated. Free correlation is this double headed curved arrow means that we don't really care about how these different explanatory variables usually denoted with x are related. But we are interested in estimating how they explain or predict the dependent variable y. The strength of influence of each variable is quantified by a regression coefficient beta. So we have one beta for each x here. Then we have beta zero or the intercept which tells us the base level of y. When all of these explanatory or independent variables are at zero. And then we have some variation u that the model doesn't explain. So this is remaining variation that is not explained by the model. So let's say that the model explains 20% of the variation of the dependent variable which is fairly typical for business research. Then the unexplained variation would account for 80% of the true variation of y in the data. In equation form we can see that the y here is a weighted sum of the x's and the weights are the regression coefficients. And each of these regression coefficients quantify what is the influence of one variable of one of the independent variables on the dependent variable. So for example we can model patient satisfaction as a weighted sum of physical productivity, physical quality, physical accessibility and some variation that the model doesn't explain. What's important to understand is that these effects are independent. So when x increases one unit then that beta tells what is the effect of one unit increase independently of the other variables. And also they are linear so that we always assume that one unit increase in x is always sort of the same amount of increase in y which is quantified by the beta. Graphically regression analysis can be understood as a line. And I will show you two variant regression analysis. This is also called the simple regression because we have only one independent variable. So here the independent variable is let's say it's years of education for example and this dependent variable here is let's say it's salary. And we are interested in knowing what is the linear relationship so what's the best line that explains this data. So regression analysis in the simple regression with one independent variable basically you can think of it as plotting all the data as a scatter plot here. We will show some scatter plots a bit later and then draw a line through the data. So that gives us the regression line. These are the slope of this line here how strongly it goes up or down is quantified by the regression coefficient. We make some assumptions when we run a regression analysis. One of the key assumptions in justifying regression analysis is that these observations then are equally and normally distributed on the regression line. So that when we have a regression line here most likely case is that the observations are close to the line but there can be some observations that are far from the line but they should be relatively rare.