 If you are trying to meta-analyze a diverse body of literature and are not sure how to deal with potential differences between studies that might influence the effect sizes, I have a solution for you. My name is Dr. Kaspar van Lyssa, and this is my work on selecting relevant moderators using Bayesian regularized meta-regression. I conducted this work together with Syrivan Erp and Eli Clapper, and I'm presenting it to you on behalf of the evidence synthesis and meta-analysis in our conference. The validation paper for this method is available at the DOI listed on the screen and in the comments below. First, let me briefly introduce the problem that we tried to solve. When meta-analyzing diverse bodies of literature, we often find that the same or similar research questions have been investigated in different samples with different instruments and different study designs and different methods, and all of those could potentially introduce systematic bias in the effect sizes found. One way to deal with this heterogeneity between studies is to exclude all studies that are not closed replications, but sometimes we don't want to do that. Sometimes we want to provide a more comprehensive overview of the literature in a subfield. But then we run into a specific problem, which is that the number of potential moderators is often high relative to the number of studies in our dataset. In other words, we have low power to account for between study differences. The solution I want to introduce is Bayesian regularized meta-analysis, and it is a tool for selecting relevant moderators out of a larger set of candidates. In general, there are three ways of dealing with heterogeneity in meta-analysis. The first is to assume that all studies tap into one true population effect size. This assumption is reasonable if the studies are, for example, closed replications. You can then assume that there is one population effect size and use all available studies to estimate it. This is called the fixed effect model. Another assumption is that studies are similar, but not closed replications. Any differences between studies are small and random, not systematic. If this assumption is defensible, we can use a random effects meta-analysis. That basically assumes that the population effect size is followed in normal distribution and we can estimate its mean and variance, which is called tau square. But the third case is more interesting, and that is when there are systematic differences between studies that we suspect might cause heterogeneity. You can code those differences between studies as moderator variables and account for their influence using meta-regression. However, when the number of studies is small relative to the number of moderators, we run into what's known as the Curse of Dimensionality. There's not enough data to estimate those parameters reliably. This introduces a risk of overfitting the model. To understand overfitting, consider these seven randomly selected data points for a moment. I'm trying to model them with just the mean of y, which is a flat line, and we see that this flat line explains 0% of the variance. If I add the first order polynomial, or the linear effect of x on y, we see that we explain a non-zero amount of variance. This is linear regression. But we can keep adding polynomials. For example, here's the second order polynomial, and we now already explain 38% of the variance. And a very interesting thing happens when I add up to the seventh order polynomial. Now you see that this model explains 100% of the variance, and that is because my regression model has become so flexible that it exactly touches every data point. So the model perfectly describes the data. The problem, however, is that the model makes bizarre jumps between observed data points. In other words, we can no longer extrapolate, and the model will not generalize to new data sets. This is a case of overfitting. Overfitting is when a model describes the data at hand perfectly, but no longer generalizes well. Now this same phenomenon applies when we try to fit a meta regression model to a data set that has relatively many moderators compared to the number of cases. And that's the problem we're trying to fix. We basically have to answer the question, which out of several candidate moderators do systematically influence the observed effect sizes? There are different ways in which we could make a selection out of potential candidate moderators. One way is to use theory, but we often lack theory that is suitable to whittle down the number of candidate moderators. Now you may think, my field is strongly based on theory, so I can make a good pre-selection. However, keep in mind that the ecological fallacy dictates that we cannot use theory at the individual level to perform moderator selection at the study level of analysis. But aside from theory, there are also statistical methods to select relevant moderators. One common practice is to use p-values from bivariate meta regression, but one problem with this is that p-values are not designed for model selection, so we're using the wrong tool for the job. Another problem is that this approach rarely controls for multiple testing. Yet another limitation is that this is no formal solution to the problem of overfitting. And finally, the effect of each moderator depends on all of the other predictors in the model, but by using only bivariate tests, we ignore that completely. Another potential solution would be to use information criteria. But with information criteria, we can construct several candidate models, which include different sets of moderator variables, and then use an information criteria to select the best model from the set. Information criteria have the advantage that they are designed to perform model selection, but the problem is that we have to manually specify all candidate models, and there's no guarantee that the best possible model is even in the set. And the third solution, what we are using, is variable selection. This is a technique from the machine learning literature. The specific variable selection technique that we use is called regularization. And to explain how this works, let's first have a look at the standard metoregression model. It holds that the observed effect sizes are a function of some intercept, plus the effect of several regression coefficients, plus between studies heterogeneity and sampling error, which is assumed to be known. It is typically estimated using restricted maximum likelihood. And in restricted maximum likelihood, we choose the model parameters that minimize the distance between the model implied prediction and observed data. This method has low bias, which means that, on average, the estimated parameters are close to their true value. However, it also has high variance, that means that if we were to conduct two repeated meta-analyses on different subsections of the literature, the parameters of these meta-regression models would vary substantially. And in general, there's always a trade-off between those two quantities. A method with low bias will have higher variance and vice versa. The trick in machine learning is to find the optimal balance between bias and variance to find a model that is not overfitted and that generalizes well. And regularization performs this trade-off, increasing bias in order to decrease variance. The most well-known regularization technique is lasso regression. And instead of minimizing the residual sum of squares, lasso regression minimizes the residual sum of squares plus the absolute sum of all regression coefficients. Now this incentivizes the regression coefficients to be as small as possible. And the result is that the effect of some irrelevant moderators is shrunk towards zero. So variable selection is performed here by shrinking some regression coefficients towards zero. It has long been known that there is a Bayesian equivalent to lasso regression, and that is regression with a Laplace prior or double exponential prior, which assigns a high prior probability mass to values close to zero. Bayesian regression differs from frequentist regression in the sense that it combines prior expectations about likely parameter values with the observed data to estimate the posterior probability of different parameter values. The reason I'm talking about Bayesian estimation is because it has several advantages over the frequentist approach. One key advantage for meta-analysis is that it provides valid inference for all parameters in the model and for derived quantities like I square. Another advantage is the intuitive interpretation of the parameters. Another advantage of Bayesian regression is that it can be extended to use other regularizing priors aside from the lasso one. For example, the horseshoe prior still has a high peak probability mass around zero, but it has heavier tails. And that means that once a parameter is allowed to be in the model, it is not as biased as it would be when using a lasso prior. And a final advantage is that Bayesian regularized regression can be estimated in existing software, specifically the Stan package. With this in mind, I would like to introduce Bayesian regularized meta-analysis or BRMA, which we implemented as a function in the PMA package, which stands for Penalized Meta-Analysis. BRMA models are estimated in Stan, but because we use pre-compiled code, you don't have to set up your computer with a toolchain to compile the models, and the models run very quickly. Moreover, because the model still inherits the StanFit class, you can use all of your familiar plots and summarizing functions from other packages. This method was validated in a recent paper in Research Synthesis Methods and the DOI for that paper is on the screen. Just to briefly summarize this simulation study, we manipulated several parameters. The combination of all of these design parameters created 1944 unique conditions, and for each condition we simulated 100 data sets. We compared several performance metrics, and the first one was the predictive R-square. That is, how well does the model predict effect sizes of a new data set? Bayesian regularized meta-analysis with a horseshoe prior had the highest predictive performance 50% of the time, followed by industry standard random effects meta-analysis 37% of the time, followed by Bayesian regularized meta-analysis with a lasso prior 13% of the time. Moreover, in conditions where there was a true population effect, the predictive R-square was higher for BRMA models than for RMA models. And when the population effect was zero, so there was no effect, we found a negative R-square for the industry standard RMA method, but not for Bayesian regularized meta-analysis, which indicates that R-method protects better against overfitting. We also found that some of the design parameters had an effect on model performance, and the largest differences between algorithms were due to the effect size, the number of irrelevant moderators, and the sample size. Importantly, compared to industry standard RMA analysis, BRMA with a horseshoe prior was relatively less affected by the number of irrelevant moderators and the sample size, once again suggesting that this is a suitable method to perform variable selection, even in small samples. We also examined the sensitivity and specificity of these different algorithms. Specificity is the ability to correctly identify that a moderator is relevant, and specificity is the ability to correctly reject a moderator if it is not relevant. What we found was that sensitivity was the highest for RMA, but specificity was higher for our regularizing algorithms, and the overall accuracy of all methods was approximately equal. That means that R-method trades off better rejection of irrelevant moderators for worse selection of relevant moderators, while maintaining an overall similar error rate. We also examined the bias in the estimated parameters, and what we found is that for the regression coefficients, BRMA with a horseshoe prior had the greatest negative bias, followed by lasso BRMA followed by industry standard RMA. Note that all of these algorithms, including RMA, provide negatively biased estimates, but by design, our method introduces more bias. We also found that BRMA had lower variance, especially BRMA with a horseshoe prior. When we look at the bias for the residual heterogeneity, however, we find that our method has the lowest bias, and we also find that both of our methods have low variance for the residual heterogeneity compared to RMA. So the conclusions are that BRMA has several advantages over industry standard RMA. It has better predictive performance, so its models generalize better. It overfits the data less. It is useful as a small sample solution. In fact, there were several cases where the model for RMA didn't even converge, so you would need a method like BRMA even to get your model to converge. BRMA is also better at rejecting irrelevant moderators from a large set of potential candidates, and the overall variable selection for BRMA remains equivalent to that of RMA. Of course, the resulting regression coefficients are biased towards zero, but this is by design. But interestingly, the residual heterogeneity did not show more evidence of bias than that of RMA. Another really useful advantage of BRMA is that it performs variable selection within the linear regression framework, which most reviewers and editors will be familiar with. This doesn't mean that you only have to include linear effects of all of your moderators. Like any type of regression analysis, you can compute interactions and non-linear effects. But the advantage of BRMA is that you can create a complex model that includes all of those interaction and non-linear terms, but then use regularization to pare it down to only those predictors needed to describe your data well. Now let's look at an applied tutorial so you can use the method in your research. When you're using RStudio for your analyses, it is always important to create a new project for your analyses. So that's the first thing I'm going to do. I'm going to use an existing directory because I just created it. Specifically, I'll use this directory, titled BRMA. We'll be using the data graciously donated by Valeria Bonapersona. It is included with the Pima package, but here I saved it as an SPSS dataset to show you that we can even recover from that. So the next thing we do is we generate a new R script. And we load the necessary libraries. So to run the analyses, I need the Pima package. If you don't have the Pima package, you can install it as such, install packages and between quotation marks type Pima. And we're also going to load the foreign package to read the SPSS data. Now we load the data into an object in our R environment. We'll call that object DF for data frame. And we have to specify that we want to load it to a data frame and that we want to use value labels. Now we have the object in our environment. We see that the effect size is there, its variance is there. There's a few moderators, including your publication. And there are two different data types, numeric and factor. So we have categorical as well as continuous predictors. Now I will load one more package, which is the tidysam package because it has a good descriptive function. If we run that function, we see that there is a certain percentage of missing values. We have 721 observations on age and weeks, but we have 734 on all of the other variables. Now the nice thing is that Pima can handle multiply imputed data. So we're going to load the mice package to impute the data. And we create a new object called DFimp for the imputed data. So now we can estimate our meta regression model. We want to assign the results to an object, so we'll call that object res. And we'll use the function brma. And now we want to predict yi from all moderators. That's what the period represents. And the data we use is DFimp. And that is just going to run the analysis for each of the imputed data sets and combine the results. The analysis is now finished. I had to cut in the video because it takes a long time to estimate the model. My beard grew another inch. We see some warnings, but these are nothing to be concerned about. Let's have a look. So what we see here is that these are warnings about divergent transitions and effective sample size, etc. These may indicate problems with convergence. Generally, the solution is to increase the number of samples taken from the posterior. Important is that we need a relatively large, effective sample size. This is the number of uncorrelated samples that have been taken from the posterior. And we see that these are all very high. Another indicator of convergence is that the r hat is close to 1. And this is the ratio of between chains variance relative to within chains variance. And if the chains have mixed well, that ratio should be approximately 1. So I think that this model has converged. With that in mind, we can proceed to interpret the model results. We see here that we have an intercept, which is the estimated effect size when all moderators are kept constant at 0. We have the effect of several moderators and we have the residual heterogeneity. And for each of these parameters, we get the posterior mean and we have the 2.5%, 25%, 50%, 75%, and 97.5% tiles of the posterior. And you can use these to perform non-parametric inference. For example, a 95% credible interval is bounded by 2.5% and 97.5%. Personally, for inference, I would use the posterior median and a 97% credible interval. Except perhaps for the tau square, because that is variance parameter and it can only have values greater than 0. For that parameter, we might want to use a highest posterior density interval. And I can show you a cool trick to obtain the highest posterior density interval. If we examine the class of the res object, we'll see that it's a brma object. But we can easily convert this to a stanfit object, which allows us to use all of the other convenience functions that exist for such objects. So let's make a copy called res stan and to that we assign as stan of res. The class of this object is stanfit and this allows us to use all of the convenience functions associated with stan objects. So if we load the library rstan, then we can, for example, get a trace plot. And this shows us that the sampler mixed very well for both the intercept and the tau square. You could get this plot for every parameter in the model. So that indicates convergence, at least for those parameters. You can also get a plot for the coefficients. They're just visually represented with some uncertainty boundaries. But also this allows us to get a highest posterior density interval for model parameters. And that's really useful for the residual heterogeneity tau square. A highest posterior density interval is the smallest possible interval that contains a certain percentage of the posterior probability. And this is good when the posterior is very non-normal, which is the case for tau square, but not necessarily for the other model parameters. So for this we need another package called Bayes test r. So here we load that package and then we can get our highest posterior density interval for tau square. So if we look at the normal results again, the credible interval for tau square is 0.38 until 0.54. And the highest posterior density interval is very similar, it's 0.38 to 0.53. So in this particular case it doesn't matter much, but in your case and with a smaller sample it might matter quite a lot. This has been a basic tutorial of Bayesian regularized meta-regression with a BRMA function in the PIMA package. For a more extensive tutorial you can read the paper linked on the screen or you can look at the package vignette which is included with the package when you download it from CRAN and which is also available on the PIMA website. I really hope that you will find this method useful in your research and if you have any questions don't hesitate to reach out. My name is Kasper van Lyssa, thank you for watching and take care.