 Hello, my name is Jason Sinwell. I'm going to talk about a package on CRAN called the RegMed, which is short for the regularized mediation. I'm going to talk about the analysis of that with a couple of examples. Here's a rough outline, which isn't anything fancy, just talking about the background of regularized mediation. I'll talk a little bit about structured equation modeling at the RegMed method, and then the basic RegMed method and extension to multileveria, and then just some concluding thoughts. A little background on myself, I work at the Mayo Clinic as a biostatistician, been there about 20 years since I graduated from Iowa State University in 2002. Share screen there. My main research areas are in cancer genomics. Specifically, I've worked in GWAS, DNA RNA sequencing, or the years, and I'm a package co-author on the packages listed. Most of all of them are collaborations with one or more other people, so definitely not my own work with those. A little background in regularized mediation. Most of this talk is from two different papers that were published in the last couple of years. The first one, both with Dan Shade, who I worked with for a while. The first one was introducing a different way to do this analysis. It's not a totally new analysis, but it's with mediation, with lots of biomarkers, especially in my area, cancer genomics, it makes sense to try and put some structure with some penalized models for these mediators, which can be, in my motivating example, they're biomarkers between exposure and an outcome. And then the 2021 paper is an extension to multileveria data. And if these slides become available, these are all active links, which you can see. And then I just put a link to the RegMed package on CRAN, where there's a vignette, which covers actually a lot of these examples, but I try and give a few more details in this seminar. Okay, so mediation, motivating experiments. So we all know their exposure variables, which I'll call X in most of this talk, and those can have an effect on an outcome. Most of the examples that this works with are continuous outcomes, but you can take residuals from a logistic regression, make it work that way, if you need to, or from a survival model. The variability in the outcome can be explained by one or more biomarkers. Those would be called, I'll call mediators, for most of the purposes here, but those mediators, biomarkers, are not independent of this exposure variable, so you have to account for that indirect association. So these kind of models impose that linked association between the exposure and the biomarker, so two different levels there, or one level there, and then the second level being the biomarker and the outcome. But also we wanna allow for association between the exposure itself and the outcome. So just to put this in the graphical terms, I've got two different examples. These are not overlapping, these are side by side. The first example is kind of how we do regular linear regression with multiple variables. You can't really deconvolute any effect between the X and the mediate, the exposure, the blue one and the mediators, but they all might have an effect on the outcome. But on the right, pardon that I can't really control how the structure works, but at that, it's a good example that we can take advantage of our plotting of graphs where you can see that in structural equation modeling and I guess in mediation, we allow links between the exposure and mediators and the mediators to be linked to the outcome. So that's the kind of models we're trying to set up. So just to put this in the formula terms, so exposure is associated with mediators. So we model an alpha for that, any of those mediators, i's or any number of exposure, sub i. So in mediation, that has to be non-zero, the some alpha has to be non-zero. And for that same mediator, it would also, the beta between the mediator and the y would have to be non-zero as well for it to be a true mediator. But in that second equation, associated with outcome, you would also allow the exposure variable with the delta parameter to be linked and associated with the outcome. So as I was saying, for a mediator to be truly mediator, you'd need both the alpha and the beta in those two equations to be non-zero. And you can do a formal test where you do like alpha times beta, this is classical in the literature, that's kind of what's done is multiply alpha times beta, do it and then see if that's different than zero. And you can do that over a group of mediators where you can sum that. But what the RegMed package is trying to deal with is what happens when you have a lot of mediators. Can you filter them? So there's some work by, originally by Sobol and then more recently by Fan and Liv in 2008 for just pre-filter by covariances of those two different levels between the x and the mediators and the mediators and y or what we've offered in RegMed is a penalized approach. So we, how do we set this up? So we have structured equations modeling and we allow direct relationships from exposure to mediators and also exposure to the outcome and allow correlation among the mediators and some other metrics that are considered in this framework are CFI where higher is better, root mean square error of approximation, lower is better for that. And what we do in RegMed is BICs. So we do log likelihood and then we add penalties for all these different parameters. And obviously in log likelihood, lower is better. So, and obviously the problem is better with the penalties involved. So let's talk about what we do in the RegMed package. So, but I just first wanna note that if you do, after you're looking for latent variables where you're having groupings of mediators, that's more for just the LeVon package, which we talk about a little bit here, but so just use structural equation modeling with that. The RegMed is not talking about latent variables or is not aimed at finding latent variables for those mediator or for the intermediate terms. RegMed uses structural equation modeling to select from mediators for our best model with these best penalties for the model. So a similar package is called RegSEM, but RegMed has been compared to it where it's improved on speed, different penalty options and convergence. And like I said, we take advantage of the log likelihood and use a beige information criteria, the BIC. So what we do for penalties is a combination of L1 and L2 penalty, the L1 being termed LASSO and the L2 penalty being a ridge-based penalty. So the Sparse Scrooge LASSO penalty used in RegMed has a lambda that penalizes all these different terms. So outside of this is the log likelihood. This is the penalty on the actual coefficients that I'm simplifying and singling out here. So a lambda is, we apply that penalty to everything. And this W is a penalty on the delta, which is the direct effect between X and Y itself, exposure and Y. And then we have two different terms here, awaiting between, so F is kind of a fraction of what weight gets applied to the LASSO and the ridge L2. So the LASSO being on the right here, we have a penalty on the absolute value of the alpha plus the beta. And then the L2, the ridge penalty being a penalty on the square term of those, but we're adding them instead of multiplying, which is what we talked about in previous slide there, where we multiply. So an F as a fraction of that penalty, like I talked about, and I already talked about the weights and the delta, W weights, how much to penalize delta in addition to the lambda that's still applied to there. So we have a simple example that we take advantage of with data that we simulated and put in the RegMed package, just called MedSim. X has 10 exposures, this is 100 subjects, and 200 mediators. So we just choose the first X for simplicity here, and we're gonna try and pre-filter the mediators. So we're gonna try and just announce the steps are to pre-filter and then find the best lambda penalty to apply where we get the best BIC and apply that fit, select the final model. That sounds like the same thing, but it's not, I'll show you why, and then fit a final model without the penalties. So the simple example, first we just have an X, the first X from that matrix of X's, and there's actually two Y's, but we just do the first Y and then give it all the mediators, and I'm gonna tell it to pick just seven for this example, and they still run off my screen. Sorry about that, but it does pick seven mediators, and they're just simply named, so it doesn't distract where we all be like, oh, I like that mediator if we had a real example, but so then with those pre-filtered seven mediators, I put them all in a matrix or a data set called X1, or yeah, X1, I set X1, Y1, and then mediators, pass those to a regmet.grid, which is running, trying to find the best lambda penalty. So I give a range of lambdas one to zero and to differ by negative 0.33, and then a fraction of that lasso and the weight of delta are specified as below, and then it does give some log messages, and then here is the BIC, so we choose, you don't have to choose the lowest, you can try and interpolate between where those lambda penalties are, like the lowest BIC here seems to be the 803, and they all converged different number of iterations. So here is, I did an actual fit with the best lambda, I just chose 0.3, and we see that I just condensed this to two column matrix of the alpha penalties or alpha coefficients, these are penalized still, so they're lower than what they would be estimated otherwise, and so we see a lot are non-zero for the alphas, this association between the coefficient between the X and the mediators, and then only two that are non-zero between the mediators and Y, sorry about that. But notice that not all these are that big, so I can filter those further. So what I actually do, we actually have a function called regmeted edges where I take that fit and I, looks like I've used a different one when I was going through this, but it does the same thing, lamb four, so sorry about that, that was lamb three with lambda three, so I fit the wrong one there, but it does the same thing, I was just tinkering with those as I was developing the presentation, but the napsalon, that's the lowest value that I'm gonna allow the alpha or the beta to be, absolute value, so if I use lexon 0.01, I'm only gonna actually have one true mediator, I'm not gonna have mediator 99, I'm gonna have mediator one, if I say I want to only show mediator, so this is where the plot shows that I only have that mediator one, but you might think of a scenario where you want to show anything that had that met that alpha or that epsilon 0.01, so I wanna keep more of the exposure association between exposure and mediator, so I might do a different option called any kind of association or any kind of correlation that it's picking up, so this would show all the extra relationships that it did estimate between the X and the mediators, and then we still have that one between the exposure, mediator, and the outcome. So then reminder that the RegMed is penalized, so what we offer is a way to fit Levan where we pull out what's necessary from RegMed and then fit Levan with it, and then this is a quick output of what you'd see from unpanelized estimates of those coefficients between all those different terms. Now we do, the last row is, I think roughly what you'd have, I don't understand Levan that well, so I'm not gonna overinterpret too much, but I think that's like the baseline effect. All right, so multiferous extension, this is, I don't have much time, so I'm just trying to go a little quickly. So now those X-singer exposures, so note that our example had multiple exposures and another outcome, I'm just gonna still keep with one outcome, gonna give it more exposures, and this is just with a different grid of lambda penalties results or the code I'm hiding a little bit, but it doesn't choose too many mediators, so I'm just gonna give it, I think a lambda of 0.1 here even though it wasn't the absolute best, somewhere up here between these two, lambda of 0.25 or 0.3 might be the best, but I wanna show more of the mediators, so just as this example, so it's called MVregMed, so I'm gonna fit that with the lambda of 0.1 because I wanna see a few more of those mediators, so allow less of a penalty, and then I'm gonna choose the edges with an epsilon of 0.05, and so a little different output here, I'm just saying what kind of vertices or what kind of coefficients were kept, so I have multiple mediators, X1, X2, and I have multiple, and X5 actually, and I find different mixes of alphas and deltas that were kept by that criterion, so here's an example of the graph that would be, remember that it does get a little complex, and I'm trying not to have these mediators be too distracting, so you see different mediators, different Xs, and a lot have association with the outcome in this example, so here's a real-world example that was published in the MVregMed extension, lots of SNPs are associated with different heart disease outcomes, but other biomarkers as well, LDL, LHGL, and a couple other things, so the big group SNPs by region, and then we put them in, so those aren't necessarily continuous, but it does work as long as they have good sample size to put those into a multivariate mediation framework, so how do those SNPs and biomarkers play a role in either end point, and then I'm just gonna show what it looks like, and you can see a lot of complex relationships to untangle, but without doing this, I mean, the research questions are like, hey, there's really, we don't know how to untangle this, so now you can apply, this is what the method's telling us with different penalties, and telling us that these kind of associations are going on with the outcome, the red, and the different biomarkers, and you can switch up mediations and exposures as well, depending on how that framework goes, but you're left to interpret it to yourself, so some quick conclusions, note that the experiments don't have to fit perfectly, but the mediators can help make sense of the cause or relationship that might be going on. Different alphas, betas, I noted that, don't always have to have both, which, since we allow more flexibility, you can do other kind of quantitative variables or use residuals from something else. The graphs were useful, and then a final model fit with the Le Bonne, sample size limitations, not much time to talk about it, maybe in question and answer session, please let me know your questions in the question and answer session. Thank you.