 Mediation analysis is useful because it allows us to study mechanisms. In this video I'll talk about the modern approach to mediation analysis called causal medias. If you come from regression analysis or structural mechanism model in background, the literature and causal mediation analysis might be a bit difficult to understand because the terminology is a bit different. Instead of building on any particular statistical model, this literature builds on the counterfactual model for causality. This little glossary from this guidelines article for medical researchers is useful to have at hand when you start reading about causal mediation the first time. Let's go to the axon mediation model. The first approach to mediation that we're usually taught on research methods course is this product of coefficients or Marian and kidney approach. The idea is that we have two models, we have model for Y as a function of X and M and then we have model for M as a function of X. We can have some controls too, but this is just the simplest possible case. And then we learn that the mediation effect is the product of two coefficients. So it's this path beta M1 multiplied by this path beta Y2 and that gives you the mediation effect. Another thing that we learn is that there are two kinds of mediation. There is full mediation and there's partial mediation. If this direct effect from X to Y is zero after controlling for M, we have full mediation. If it's non zero we have partial mediation. Of course this linear model can be extended by for example adding a correlation between these two error terms which would make it an instrumental variable model for mediation. This approach and the linear models have a couple of important limitations. The problems are first that this approach is applicable to only a few special cases of non linear models. So for example if this model for mediator is logistic recursion model, the product of coefficients wouldn't really apply. The product would not make any sense in that case and in some cases the product might make sense but the result that it gives would still be incorrect. The second problem is that this model assumes no interaction between beta M1 and beta Y2. The idea here is that there is causal heterogeneity. The effect of X to M is not the same for all people. The effect of M to Y from M to Y is not the same for all people. And if these variations in these two effects correlate, then this model no longer holds. The third problem is that this is not really a definition for mediation. So mediation is a causal concept and defining mediation in terms of a particular statistical model would be problematic because we might want to think about mediation in broader terms as well. Let's talk about each of these three problems in turn. The non linear mediation model, one example could be something like that. So we have an exponential function for Y and this might be for example Poisson recursion analysis. Let's see how it works. So we take the equation for the mediator and we plug it in in place of M and that's what we get. So we have exponential function for Y and then the equation for M is there inside the exponential. We play around the equation a bit and we get that kind of war. So we simply multiply out and remove the parenthesis and we can see that this is still a pretty nice exponential model. We can actually interpret the product of coefficients in terms of exponential model coefficient. So we multiply them together and that is interpreted as a relative change in outcome. Why does it work? It works because the effects are interchangeable. So we multiply two coefficients together and the product, they are interchangeable in the way that they both contribute to the product the same way. They are also interchangeable in a way that if we multiply one by two, we can divide the other by two and the outcome will be the same. So because these coefficients work the same way in the equation, we can multiply them together and that gives us a sensible interface. Let's take another example. So this is an only one again, but we have an exponential model for mediator like Poisson-Ricoson analysis for mediator and let's try to do the same. So we have the equation for Y, we take the equation for M, we plug it in place of M in the equation for Y, we get that kind of equation, we try to multiply to remove the parenthesis and we get that kind of monstrosity here. In this case, the product of coefficients does not work. The reason is that these coefficients don't work the same way, they're not interchangeable anymore. So beta M1 and beta Y2 work differently because beta M1 is exponentiated, beta Y2 is not. The model also is some kind of weird combination of additive model, so we add beta 0 and beta Y1 together and then we multiply these coefficients together. So it will be very difficult to interpret that model. So this is the nonlinear problem. Then we have the second problem, no interaction assumption. And let's take a look at what that means. Ima and his co-authors use this table to demonstrate the no interaction assumption. They show that there is a positive causal effect of the treatment of the mediator and there is positive causal effect of the mediator on the outcome but still it's possible that the mediator effect is negative. How can that be? Let's take a look at this table piece by piece. So we have first two potential outcomes. So this builds on the counterfactional model for causality. And we have potential outcomes means that we could assign an individual into treatment or control but not both. So there are two potential outcomes out of which only one is realized. The causal effect is the difference between these two outcomes. So we have the outcome for these individuals. If there is the treatment for some, mediator is positive for some, it's zero. And then we have the outcome for the control cases and for some it's positive, for some it's zero. Then the causal effect is the difference between these two potential outcomes and we can see that we have causal heterogeneity here. So for the first part of the population the effect is positive. For this part it is negative. And then for these two other parts there is no causal effect. So we normally can't estimate these individual causal effects. The best thing that we can do is to estimate the average treatment effect which is shown here. So it is simply the average of the individual effects weighted by the population fraction. So 0.3 times 1 plus 0.1 times minus 1 equals 0.2. That's the average treatment effect. Then we have these two other columns and these show the scenario where the mediator is manipulated. So if the mediator is 1 versus the mediator is 0. So this is the outcome for mediator treatment. So what is the value of y when mediator is set to 1 and then we have the mediator control when mediator is set to 0. What is the value of y for the different fractions of the population. And again we can see that we have a causal heterogeneity in these mediator effects but the average treatment effect is 0.2 calculated the same way minus 1 times 0.3 plus 1 times 0.3. That's 0 minus 1 times 0.1 plus 1 times 0.3. That is 0.2. So we have 0.2 is the causal effect if we manipulate the treatment and if we manipulate the mediator the causal effects are 0.2 in both cases. So how is it possible that the mediator effect is 0.2? Let's take a look at this table. So we have here positive average treatment effects for mediator and for treatment and then we have negative average treatment effect. How can that be? The reason is that if we calculate the mediator effect we are not looking at these averages but instead we are looking at the individuals. And for this part of the population and this part of the population the treatment has no effect on the mediator. So there is no mediation effect we can just ignore those. And then if we take a look at what is the average effect it's 1 divided by minus 1 it is minus 1 here it is minus 1 multiplied by minus 1 which is 1 here and we take a weighted average minus 1 times 0.3 plus 1 times 0.1 gives us minus 0.2 so there is negative average treatment effect. Let's take another look at this example to really understand what is going on. And this is a simulated population of 10 subjects which corresponds to the table shown on the previous slide. And this is the mediation model. So we have T2M, M2Y and then product. We can see that the first three subjects are the mediation effect the product of these two coefficients is 1 then we have 3 0s, 1 or minus 1s, 3 0s, 1, 3 0s and the average is minus 0.2. How is that possible? So let's do a bit of math. So we have beta M1, beta Y2 and then the product of coefficients. And the product of coefficients approach works when the expected product of coefficients is the product of the expectations. So normally when we calculate the product of coefficients we calculate this beta M1 which is an estimate of the average treatment effect it's not individual treatment effect then we have beta Y2 which is an estimate of the average treatment effect again and we multiply them together. So we do this kind of right this multiplication here we multiply two average treatment effects together but we should really be looking at the average or the expectation of the individual treatment effect instead. So when does this equation hold? If we Google product of expected value of product of the random variables we learn that this holds when these two effects beta M1 and beta Y2 are independent. A bit more Googling tells us that the expectation of product is the product of expectations plus the covariance of these two X and Y. So this X and Y are these two betas here. And if we calculate the covariance between beta M1 and beta Y2 the individual causal effects we can do that in R. And that's the covariance we pluck the covariance here it is minus 24 plus 0.2 times 0.2 gives us minus 2. So we can apply math to come to the same conclusion and here we need to multiply by 9 and divide by 10 because we are working with the population covariance instead of our sample covariance which we normally calculate. So the no interaction assumption means that these two causal effects should be uncorrelated. So this is the second problem and what's the third problem? The third problem is that causality or mediation is really a causal concept and not a statistical concept tied to a particular model. So what will happen if we define mediation as a product of two coefficients? So let's define that mediation effect is product of two coefficients where beta M1 and beta Y2 come from linear model. If we define mediation this way then these two examples that we just went through wouldn't be mediation but this is something that we would consider as mediation. So most researchers would say that yes this is a mediation model because M is affected by X and then M affects Y. So there's a chain of causality same here. So if we define mediation in terms of linear model it's incompatible of how researchers actually think about medias. So this is the mediation as a product of coefficients. And then the next question is how should we actually define mediation? So if the product of coefficients does not really work as a definition what should be the definition of medias? Let's take a look at the causal mediation literature. This article by Nyen in Psychological Methods talks about the definition of mediation effect and how we can actually define the mediation effect using counterfactuals in a few different ways. And they start with an important distinction. So we have three things. We need to first define what is a mediation effect. This is the first problem. The second problem is identification. So what kind of assumptions are needed for us or what kind of research designs, what kind of data are needed, are required to make a causal claim about mediation. So this is the causal identification problem. And then we have estimation problem. And these are three distinct things. So we have the definition, then we have resources and assumptions. And this is just the mechanics. How do you actually calculate the mediation effect using your statistical software? And I'll talk about the definition of mediation from now on. I'll talk about estimation and identification in other videos. So how do we define a causal mediation effect? IMAI has written a really nice paper. It's a bit technical, but it's a classic paper in 2010 in Psychological Methods on how do we define mediator and mediation. And they consider four cases that we have. So we have four combinations of treatment and mediator. So we have these four cases. We have treatment, an individual received a treatment, and their mediator was observed after the treatment. Then we have individuals who did not receive the treatment, and their mediation was observed under the condition of no treatment. So we mark this as y1, m1, y0, m0 to indicate no treatment, and mediator was not affected by treatment. Then we also have two counterfactual conditions that we can think of. So we have this counterfactual, where this individual received the treatment, but their mediator is as if they would not have received the treatment. And then we have this fourth case, individuals who did not receive the treatment, but mediator is as if they had received the treatment. We have, of course, no way of ever observing these two outcomes, and that is a statistical problem, an estimation problem, but it's not really a problem for definition of mediation. So let's regroup these outcomes, these four cases a bit, to see what Imai's definition of mediation is. So we regroup them a bit, and we can see that the individual causal mediation effect is the difference between two outcomes. So for in Imai's paper it's defined this way. So for the treated cases, the mediation effect is the difference between the observed case and the counterfactual case, where the mediator would not have been affected by the treatment. For the controlled cases that did not receive the treatment, the mediator effect on the individual level would be calculated by setting the mediator to the treatment condition, but holding the individual still in the untreated group. And this equation shows it. So the mediation effect is the difference between whatever was observed for the treatment compared to whatever was observed in the treatment and had the mediator been observed in the opposite treatment condition. So it is the difference between the observed case and the counterfactual case, where either the mediator would have been affected by the treatment for the untreated cases, or the mediator had not affected by the treatment for the treated cases. And there are a few nuances on this. This article by Sally gives a more accessible explanation. So they write that the natural direct effect is how much the outcome would change if the treatment was exogenous set to 0 or 1 for each individual, but the mediator was kept at the level it would have been taken. So this is natural direct effect. And the natural indirect effect or the mediator effect is the same thing except that we manipulate the mediator counterfactual instead of the outcome. One important word in this is the natural. And this is something, these natural effects are contrasted to controlled effects. The controlled effects in this causal mediation literature refer to effects that you would get from studies where you manipulate the treatment and you also manipulate the mediator. But natural effects are much more common and they refer to scenarios where the treatment is allowed to naturally affect the mediator without the mediator being manipulated by the researcher. So most of the time when we talk about mediation effect, we refer to the natural mediation effect or natural indirect effect and the natural direct effect. To make things a bit more complicated, Nydang here explains that there are actually a couple of different ways that this mediator can be, these mediation effects can be thought about. So they think about three words. And one word is where the individual is assigned to the treatment and their mediator is naturally observed after the treatment, sorry the control, so the assigned to control, the mediator is observed under the control condition and then the other condition that we can observe is the individual would be assigned to the treatment and the mediator is observed under the treatment. And then we have this in between world where individual is assigned to treatment but their mediator is not affected by the treatment. And they show that this contrast, the difference between these two outcomes is the natural direct effect. So this is the natural direct effect if we manipulate the treatment but we fix the mediator to be the non-treated value. And this is the natural indirect effect where we have the fixed treatment at one and then we manipulate, allow the mediator to get its natural value instead of being fixed to the untreated case. But this is not the only way. We could also have this contrast here. So we could have this contrast where we start by not manipulating the treat but we manipulate the mediator. And this is a non-trivial issue. How do we choose the comparison point? So let's take a look at the numerical example then. So we have five subjects here. We have the effect mediator under treatment, mediator under control, T0, mediator under treatment. We have the natural, untreated, non-mediation, treated, mediation. And then we have these counterfactuals that we can never observe. And these are calculated based on this simple model of exponential effect on Y and linear model of T on the mediator. The question now is which counterfactual do we use when we start to decompose the total effect? So the total effect is simply the difference between the outcomes of not having a treatment and mediator under control and having been treated and the mediator affected by the treatment. So this is the total effect, the difference between these two potential outcomes. So do we use this manipulate treatment first, mediator then, or this manipulate mediator first treatment then outcome? So which counterfactual do we apply? This is a non-trivial question because the difference is going to be quite big. So this is called the direct indirect decomposition in UN's article. So we can see that we manipulate the treatment first, the dot shows the manipulation treatment, the mediator is fixed to the untreated scenario and the average is going to be 7.2, so that's the average treatment effect. Then the average natural indirect effect through the mediator when we fix the treatment to be 1 and we manipulate the mediator, that's the dot here would be 73. So it's the difference between 85 and 11.1. If we use the other counterfactual, then we get slightly different results. So we manipulate the mediator first, we get the natural indirect effect would be 27.2 and the natural direct effect would be 53.8. And quite often we want to know compare the magnitude of these two effects. So how large share of the effect is mediated. Here when we make a comparison 73.7 is about 10 times as large as 7.3. So we would based on this first decomposition, we would say that the mediation effect is 10 times as strong as the direct effect. But in the second comparison we would show that the actually there are natural indirect effect, the effect of the mediator is 27.2, the natural direct effect is 53.8, so it's actually the mediator effect is just half of the direct effect. So is the mediator effect 10 times stronger than the direct effect or is it half of the direct effect? That's a 24 difference. So how do we really reconcile it? The new newspaper talks about how to make this choice, but my take on this is that there's actually a better way of approaching this. The idea of thinking about this as absolute difference is not ideal. The reason is that we have an exponential model here. And interpreting exponentials in terms of absolute differences is not as useful as interpreting exponentials as relative differences. If we calculate relative effects, then the NDE natural direct effect is 2.7. So the outcome increases by 2.7 and the natural indirect effect is 7.4. The outcome increases by 7.4 and now the order doesn't matter. So if we calculate the ratios of these two effects, they are going to be the same in both cases. So NUENG's article gives guidance on which counterpart to apply. My take is that if you really have a non-linear model in the outcome, which would produce different effects for the natural indirect and direct effect depending on the decomposition, it's better to interpret the model as non-linear instead of looking at absolute differences. So in this video I talked about the definition of causality. So what is the causal effect to the mediator? And I went through the counterfactual definition of causal mediation effect. Now this is a lot more complicated than Baron Kenny. And can we say that the Baron Kenny is outdated because of availability of this more rigorous definition of causality? The answer is yes and no. In the sense that the yes answer is that the Baron Kenny product of coefficients probably should not be used as a definition of what is medias. But as we'll see in my video about estimation of these effects the Baron Kenny product of coefficients is still a useful estimation technique. So how do we define an effect? How do we identify an effect? And how do we estimate an effect? Are three different things. And as an estimation technique this is still completely valid.