 Interpretation of non-linear models is a bit more complicated than the interpretation of linear recursion models and there is a lot of confusion around this in the literature. Let's start with an example. So in the example we have an effect first that is constant. So every each additional increase of education by one year increases your monthly salary by 300 euros. The average salary for women is 2,000 euros, average salary for men is 3,000 euros. So the effect is the same for both and that's fairly easy to interpret. Let's take a look at another example. What if additional year of education increases salary 10% compared to the current level? So that's also fairly easy to understand. Both receive the same amount of more salary. Okay, so where does the problem come from? The problem comes from that these tell us two different things. So relative increase of 10% and absolute increase in 300 euros are not the same thing. Let's take a look at it another way. 300 euros is 15% for a woman and 10% for a man. 10% is 200 for a woman and 300 for a man. So the question is, is the effect of education moderated by gender? Is the effect the same? It depends on what kind of question we are asking. So it could be that we would interpret this as a moderation effect and we wouldn't interpret this as a moderation effect. Or it could be that we interpret this as a moderation effect and this wouldn't be a moderation effect. It depends on whether we are interested in absolute increase or relative increase. Then there is the third scenario where an additional year of education increases women's salary by 12% and men's salary by 10%. So in relative terms women's salary increases more per year of education, but in absolute terms the women's salary increases less. So is the moderating effect of gender positive or negative in this scenario? It depends on what we want to say with the data. And typically trying to answer a yes or no question relating to these effects is not an ideal way. The better way is just to explain what the effect is like instead of trying to say that it's either negative or positive, because it could be either way depending on which way we interpret the data. So there is a lot of confusion around interpretation of these as moderation effects when you have transformation involved. For example in this recent organization research methods paper, Becker and co-authors say that transformation can severely decrease estimates of true moderation effect, substance increase in type 2 error. That's simply not true. It's true that if you multiply two coefficients together, then after transformation the multiplied coefficient can have a smaller effect. That's true, but when you look at the results and you interpret the result, the moderation effect is there. And how you should interpret the results with moderation is always plotting. So the plots will look the same, but just the coefficient will be different. But you're not looking at the coefficients, you want to quantify the effects by plotting, and for that transformations may not make a large difference. So why is there such confusion? And where does this confusion originate from? Let's take a look at the normal regression model and then a regression model where we are log transformed the dependent variable. So the predicted values here are just beta plus beta 1 times x plus beta 2 times x. So that's a weighted sum of the axis. So the predicted values here in the log metric are the same, but we are normally not interested in the log metric. We are interested in the original metric. So we don't want to know, we're not interested in how log of salary increases as years of education increases. We are interested in knowing how much the actual amount of money that you get increases as a function of education instead of the log. So we'll do a little bit of math and the predicted value of y is the exponential of the linear prediction. So if the log is predicted by linearly, then the actual value is predicted by the exponential of the linear predictor. Then we take some calculation rules from Wikipedia, which is a great source for these basic calculation rules. And then we calculate this slightly different. So we express the fitted value of y. We can say that it's exponential of beta 0 times exponential beta 1x1 times exponential beta 2x2. The difference here is that this is a sum and this is a product. So whenever you have a linear model, all the effects are additive. So you add them after one another and the sum is the total prediction. In exponential or logaritive models, you multiply the effects together. And then the product of all effects is the final prediction. So you can see here that we multiply the effect by x1 and x2. So that's kind of like an interaction effect would be in this kind of model. So it's not that our moderation models are somehow incorrectly estimated. If you do a transformation, it's just that the model is multiplicative to start with instead of additive. And this is something that many researchers struggle to understand. What's the difference between multiplicative effect and additive effect? Let's take a look at this graphically. So we have here the effect of x1 holding everything else constant. So this is a linear recursive model and A represents the effect of all other variables that we control for. And now if the other variables increase. So if we say that we have x2 here and the effect of x2 is something positive values of x2 increase, then A will increase and these lines will shift up. So the lines will have the same direction. The effect of each unit of x and y is the same. It's only just that the base level increases. So the level increases, but the slope doesn't. In nonlinear and multiplicative relationships, the other variables A multiply the effect of this exponential here. As shown in the previous slide. So the idea is not that these curves are shifted up. Instead they are scaled by a multiplier. So this curve is both lower and less steep than this curve, which is higher to start with and then also goes steeper up. So the nonlinear effects are multiplicative. If you increase other variables that are held constant in recursion analysis, then it multiplies the curve. If you hear in a linear additive model, if you increase the other variables, it shifts the curve up but doesn't scale it. So that's the difference and this difference needs to be understood during the interpretation of the coefficients. The reason why this model or this set of results is often confused with a moderation effect is that if you fit a linear model into this kind of data, then the lines will go like that. So you can see that the slope of a line changes as a function of A, whereas the A here is not the multiplier of x in the nonlinear model. So that's the basic case of nonlinear multiplicative linear additive models. So what about the S-curve models? So the S-curve model is non additive and non multiplicative. So if we add something to the A, then we can't say that we shift these S-curves up and we can't say that we make the S-curves steeper because we can't change the shape of the curve and it's bounded between 1 and 0, so you can't really move it up or down. So how should the effect of other variables be interpreted in this case? Well, the other variables are, they are a signed way shift of the S-curve. So the other variables tell us how far along the S-curve you start. So the idea is that if other values are very small, then we are far on the left-hand side of the S-curve and the relative increase of the effect of our x-variable is very small. If the effect of other variables is large, then we are further on the S-curve and the effect of x on y is larger. Of course we can be looking at this part of the S-curve or we could be looking at that part of the S-curve. So this is on the left-hand side of the S-curve. This is the left tail, it's very flat and then it starts to increase a little. Of course, again if we fit the linear model, we can see that the slope of the line is different for each different values of the other variables that are held constantly in the recursion analysis. So whether, and if we multiply the other variables and x together, of course it would again be that they are not significant in the recursion model but still the slopes would be different if we fit lines because the effect is nonlinear. There is still another important feature in the S-curve and the S-curve is partially additive. So we have the S-curve here, this is the logistic curve but the probability curve looks the same and we can see that between about 20 to about 80 percent or between minus 1.5 to about 1.5, this is approximately linear. So the regression line if you do a linear regression analysis gives you almost the same predictions as the S-curve models do for this 20 to 80 percent probability range. And that is useful because if your predictions are in this range then you can interpret the regression results as if they were normal regression results but even better you can just use normal regression analysis. If you have a curve that only contains this part then just use a regression analysis and that simplifies your analysis quite a lot. So we only need to S-curve when we are interested in this part that starts to flatten close to 1 or starts to flatten close to 0. Here is a published example of an unnecessary use of logistic regression analysis. So the authors are interpreted, the logistic regression coefficients correctly by plotting them. So we can see that these lines are not exactly linear so this curves slightly because it goes over the 80 percent but they are pretty much parallel lines and you could have used regression analysis directly. So using logistic regression analysis when you get this kind of plots doesn't really make any sense. Of course if your reviewers tell that you have to use logistic regression analysis because your dependent variable is 0 or 1 then you can use it because it doesn't make a difference. The normal regression results and logistic regression results are going to be the same at least when you plot them. So the actual magnitude of the covariance will be slightly different. There is still another nice published example about this and here is a paper in a food note. They are using a dependent variable that is 1 or 0 and they explain that normally we would use a logistic model or probit model for 0s and 1s but in their case there are the probabilities, the predictive probabilities are between 30 percent and 80 percent and this is the linear section of the S curve so it's easier to just apply a normal regression analysis. So you have to realize that when the interpretation of the normal regression analysis and a nonlinear model would be the same then it doesn't matter which one you apply and it's normally easier to apply the linear model and it's also easier to interpret. Yet another example. So this is an example of a published example of moderation effect, confusion done right. When we look at these two different curves this is the predictive probability of patenting and this is the firm age and this is the pattern of patents in a cluster of firms and this is the number of firms in cluster. We can see that for younger firms the effect is positive and fairly strong and for older firms the effect is pretty flat and not as strong for both these scenarios. So we would say that the effect of size of clusters in terms of firms or patents has a positive effect on the propensity of patenting but the effect is stronger for younger companies. So that's the proper interpretation. What's interesting in this paper is that they estimated a moderation model and the moderation coefficients are non-significant, very close to zero. So this is for the number of patents and this is for the number of patents here and this is the number of firms in the cluster. So the effects are non-significant for the moderation effect. So when you're dealing with non-linear models you don't need to have a statistically significant interaction effect to clay moderation. We can clearly see from the figure that the effect was different for younger companies and smaller and older companies but you can't see it from this coefficient because it's a non-linear model. They explained it really well. They explained that these coefficients are non-significant and they say that it doesn't matter because you show it graphically that because you are looking at the different part of the S-curve the effect is actually a lot larger for the young companies than for the older companies. So when you plot the results this is the predicted values, the S-curve and it's different even though the interaction effect is non-significant. So it supports the hypothesis of moderation. There is no confusion. So using a transformation didn't match the interaction. If you interpret the model properly there is a moderation effect. It's not hidden. There is no lack of statistical power. You get the effect if you interpret it properly. So let's take a look at why we have these curves. One is a lot steeper than the other one even if there are interaction effects here it's non-significant. The key to look at here is the effect of a firm H. So we can do some back of the envelope calculation and get roughly that the difference between the marginal effect of H is minus 0.11 and the effect of 12 years difference 1 minus 13 minus 1 is about minus 1.39. When we look at the S-curve so this is for that plot here when we look at the S-curve here and how much is the difference of 1.4 so it's the difference from here to here. So why the slope is much steeper here is that the younger companies are further along on the predicted S-curve than the older companies. So the older companies are here in the relatively flat region and the younger companies are here where the curve starts to increase. So that's the reason why we have a steeper slope for younger companies than older companies even if the effect here is non-significant and very close to zero. We can also do another back of the envelope calculation and get the average marginal effect of H it's about the same here so that's a small error in the calculation but it turns out that it's minus 97. So that's the correct effect. So the difference here is minus 1.4 here is minus 1 and that explains why these curves are closer together than these curves because here how far along the S-curve we are compared to the older companies compared to the younger companies the difference is less than what it's here. So what's the current practice in interpreting logistic regression analysis results or more generally nonlinear models? There was a review, a pretty good article about ten years ago that most of the articles published in management journals ignore interpretation. They just look at the direction of the effect and whether it's statistically significant. Then some articles interpreted incorrectly very few interpreted correctly and the best way to interpret is to do the plotting. Also some articles that do plot the effects plot them incorrectly. It's quite common to see for example this kind of plots from logistic regression analysis. So you calculate two predictive values for low value of whatever variable two different values for high value of whatever variable and then you draw lines. That's incorrect because you're not fitting a line you're fitting a logistic curve. So the guidelines for interpretation is that instead of always do plotting and when you plot you should really plot the S-curve. No, don't plot a line because your model is nonlinear. Plot an S-curve. Use reasonable and interesting values. It's quite common that we use plus one standard deviation from the mean and minus one standard deviation from the mean. It could be the best set of interesting variables in some scenarios but for example if you are looking at old companies versus young companies then perhaps companies of one year old of age is an interesting value instead of looking at minus one standard deviation from the mean. Perhaps the oldest company or the 90 percentile is more interesting than plus one standard deviation from the mean. Then you calculate the S-curves and you should be drawing multiple S-curves from multiple different kinds of companies because the companies could be at different stages of the S-curve so they could be flat for some company and it could be very steep for another company regardless of any interaction effect in your actual model.