 In this video I will explain the supression effect in regression analysis. The supression effect is a term that is used for a feature of regression analysis. You don't actually have to understand what the term supression means, but you have to understand why certain results are sometimes occurring in regression analysis. You will basically need the term suppression only if a reviewer argues that you should explain supression, for example. I don't think there is any valid reason to discuss supression in an empirical paper unless your reviewer asks you to do. Let's take a look at Heckmann's paper because they are mentioned supression. They explain that in their correlation table and regression table the physician age has different sign. In the correlation table the correlation with the physician age and patient satisfaction is positive in regression results it's negative, and that's the supression effect. The technical definition is unimportant here. Then they explain that these variables may somehow be suppressing the variance of the dependent variable that is irrelevant to its prediction. I don't understand what that means, so that doesn't really have any literal meaning. Then they cite a textbook in statistical analysis that presumably explains what they mean. Unfortunately that's a big book and they don't give a page number, so we can't really meaningfully check what that book says about supression. So whenever you explain something and then you give a reader a book to read, then at least give the reader some indication which chapter or which page of that book explains the fact that you're referring to. Otherwise having your hundreds or thousands of readers to browse through this book and waste your time looking for a fact whose location you already know because you wouldn't be citing the book unless you have read it. Then they explain that the correlation is not statistically significant and they try different models and the results are unchanged and they conclude that supression is not a problem. I agree with the explanation that the supression is not a problem but not for the reasons that they explain. So supression effect is not something that when it occurs, it's a problematic, it's a feature of recursion analysis. So let's take a look at their actual statistics. So what are the numbers that they refer to? So they identified that the correlation between our physics and age and patient satisfaction is positive and the corresponding recursion coefficient is negative. So why could that be the case? We have to remember that correlation and recursion coefficient quantify different things. So correlation coefficient ideally quantifies a causal relationship under certain assumptions. Correlation coefficient quantifies a linear association that could be causal or it could be superior. It's very simple to see here why the physics and age is correlated positively with satisfaction but why the recursion coefficient is negative. We just need to look at the correlation table. So let's take a look at the correlation table. We first look at which variables are highly correlated with age. Well, it's the tenure. So tenure is correlated with age at very high level. Then we look at what's the correlation coefficient of tenure here. It's very strong positive. So the more experience you have, the more satisfied your patients are. Also experience correlates with age, which is quite natural because if you are like 25, newly graduated, medical doctor, you can't have much experience. If you are someone with 30 years of work experience as a doctor, you must be more than 50 because normally you are more than 20 when you graduate from medical school. So age and tenure, age and work experience naturally correlate very highly. So what's going on? Remember that the linear model implies a correlation matrix. So what is the implied correlation between age and patient satisfaction based on the correlation between tenure and the effects of tenure and age. So we go from age to patient satisfaction. We take that path once minus 13 and we take the correlation path 0.69 times 3.4, this correlation path. So that gives us some math we get that the implied correlation based on this part of the model only is 0.1, which is very close to 0.09, which is the positive correlation. So why is there a different sign? It's pretty straightforward and we have a natural explanation. When regression coefficient quantify, there are effect of one variable when other variables are held constant. So what the regression coefficient tells us that when you have two physicians that have equal amount of work experience, people tend to prefer the younger one. That is natural. But people also tend to prefer doctors with more experience and those doctors that are older tend to have more experience and the experience is the variable that matters more than the age. So the correlation here 0.09 reflects the effect of age itself, which is negative based on this model and a spurious effect due to those doctors that have more experience are also older and receive better scores. So this correlation is a somewhat spurious effect and a direct effect and in this case the spurious effect due to correlation between 10-year and the effect of 10-year, which is strong, is a lot stronger than the direct effect of age. Therefore we get a positive correlation. So that's how regression analysis works. It gets a correlation and it tries to identify how much of that correlation is spurious, how much of that correlation corresponds to a causal relationship. Sometimes the spurious part is a lot larger than the actual causal effect part and that can cause the regression coefficient to have a different sign than the correlation coefficient. It is not a problem. It is how regression analysis works.