 Hello and welcome. This is a tutorial video on an R package called MetaBMA, which can be used to perform Bayesian model averaging for meta-analysis. My name is Daniel Haeg. I'm professor of psychological methods at Phillips University of Marburg. You can contact me at this email address here or at my website, and at this website I also place this script. So if you want to follow me through this tutorial, you can copy this link to your browser and then replicate these analysis. Before we get into the details, a few words about the package. So what its main functionality is in meta-analysis, there's very often the debate should we use a fixed effects model or random effects model. The fixed effects model assumes that the true effect size is identical in all studies and the random effects model assumes that there's heterogeneity and that true study effect sizes vary. And model averaging helps you to cope with the uncertainty which of the two models is better. So essentially, we will fit both types of models and then see how much evidence is there for the two models. And then we use this relative evidence to get kind of a rated result compromise between those two worlds. Okay, now let's get started. If you want to learn more about the statistical and technical background, there's another separate video with slides and explanations regarding the statistics and the Bayesian inference. Here we'll focus on the package. You can get this package from C-Rans, just with commenting out this line, install packages meta-BMA. Of course, you can also do it in RStudio with the packages tab and then plug in and put on install and then you type in meta-BMA. However, you do it, then you have the package. You can also install the newest version from GitHub. However, be aware that for this you need R-Tools, so you need to comment out these lines and run it, but it only works if you can compile C-Code. GitHub is also a good place to give feedback. And so if you use this package and think there's a bug or you have a feature request or question, you can use the problem over there. Now, let's load the package with the library command. If you do this, you will see a red warning, loading required package, this is meta-BMA version and so on. Default priors were changed in some prior version. And this is important. So if you use this package for your analysis, it's recommended that you always specify your priors in your script. This is written down here because you do not need to do it. You can use the default priors, which makes using the package easier, but then it's kind of risky because if the priors change, I'm not sure what type of priors you used. So the recommendation is even if you use the default priors, copy them from the documentation into your script, so you'll make sure that you always use the same priors. We also load another meta-analysis package, meta-4, which is a great package for performing frequentist meta-analysis and we will only use it for a forest plot. Now, for the Bayesian model fitting, we need Markov chain Monte Carlo sampling, which is a stochastic type of model fitting, which means the results can vary slightly regarding some decimal places to make sure that that does not happen. So if you can use this set seed command, the number is arbitrary, and thereby everything that follows will give you the same results. Now, let's load a data set. As an example, it's a TOBLS data set here. It is shipped together with a meta-BMA package and I printed it here. You essentially see there are seven studies. Each study reports an effect size, in this case a logout ratio with a corresponding standard error. Now, a few words about the effect sizes. You should pick an effect size that is asymptotically normally distributed. Standard choices are coins D, HSG, fissures Z or logouts ratio. You should not use things like a Pearson correlation R because it's truncated to be between minus one and one. And you should not use an odds ratio because it is truncated to be positive. There's a lot of literature about that. You can look up. You can also use the meta-4 package to compute these effect sizes and the corresponding standard errors. And these standard errors, of course, they capture essentially the sample size. If a study is huge, the standard error will be small. Okay. Regarding the content here, just a remark. This TOBLS data set concerns the question whether intervention is effective. So there was an intervention and a control group. And the idea was how can we make hotel guests reuse your TOBLS and the intervention group. They were told something that almost everybody does it. This is the example here. Let's look at the data and then we see the intervention is effective if the logouts ratio is positive. And this appeared to be the case descriptively for five studies for the first five. And in the other two, it was negative even though the confidence intervals are quite large. Now, this is the data we have, an effect size for each study and how precisely it was estimated, the standard error. And the question is now, how can we synthesize these results? How can we get an overall estimate of the effectiveness of the intervention across studies? And for this, we need a prior distribution in the Bayesian framework. So we need to specify our expectations, what effect sizes are plausible. And here we do this as follows. There's a function called prior. And with this, you can specify prior distributions. One argument is family. Here norm stands for normal distribution. It's a type of distribution. You could also pick a t for t distribution and param other parameters because the normal distribution has a mean and standard deviation. Here mean zero standard deviation 0.3. So let's specify this prior and then you have an object of the type prior. And this can be plotted. So to illustrate it and make sure that the prior makes sense. In this case, it would mean that our expectations regarding the logouts ratio of the effectiveness of the treatment are as follows. The effects between the minus 0.5 and plus 0.5 are most plausible, most likely are priori and smaller effects are more plausible than larger effects. This is captured by this distribution. However, any type of effect is possible. So it can be even minus one or plus two. Everything is possible. This is the prior distribution and this one doesn't make a lot of sense because in this scenario, we have a clear expectation that the treatment will be effective. It will have a positive effect. So what we want is essentially that only positive effect sizes are plausible. And this can be done by changing the prior. You can add an argument lower equals zero, which is a truncation. So I run this and I plot it and then you see what happens. Essentially, you specify one-sided hypothesis. The effect size should be positive. It's not allowed to be smaller than zero. And then this is used as a plausible distribution of effect sizes. However, it's still rather uninformative. So it still assumes that smaller effects are more plausible and it's rather wide. And you can even go further and use informed prior distributions. So if you have an expert that tells you if there's an effect, it will be in this range, then you can rely on that. And this was done for another meta-analysis. We did this in social psychology and then we got the following prior. It was a t-distribution with three parameters, location 0.035, scaling of 0.1 and three degrees of freedom, truncated at zero. And if we plot this, you now see what informed prior distribution means. It's located at a certain peak, 0.35. This is the most plausible logouts ratio. And then the values around it are also quite plausible and all of them must be positive. So you see you have a lot of possibilities to define priors and the recommendation is usually to do a sensitivity analysis and check how much do your results change when you change the prior. So you could use the default priors. You could use us two experts, then you have two expert priors and you could use a very weekly prior that is even more broad and then you check how important the priors are for the conclusions. And this is called a sensitivity analysis. In the following, we will only use one type of prior to simplify things. Now we have our data, we have our expectations and we have to put them together and this is model fitting. So let's look again at the data set, how it's structured. We have three columns, essentially the study labels, the effect sizes and the standard error. And this is what you have to plug in into the fitting function. So here it's called meter fixed for a fixed effects model. And the first four arguments are simply the effect size log or error, the standard error, the label and the data frame. It's called towels. This is the standard input you have for almost any analysis functions in R. Now what's new is the D argument. This is the prior for the overall effect size. This was traditionally named D for cones D, but it's the overall effect size in this case for the log odds ratio. And the prior we looked at this above can be changed. Here it's a normal distribution. If you fit this, it's very fast. So let's look at the results. And now we have a forest plot of the meter BMA package. And you again see the observed effect sizes and confidence intervals. But at the bottom, you also see a total effect size. This is the effect size of the fixed effects model. So if you assume all of the effects are in the population identical and given the specific prior, this is your posterior estimate of the effectiveness of the model. This is your posterior estimate of the effectiveness of the treatment. You can even better look at it using the plot posterior function. With this you see the prior in gray in the background. This is the truncated normal distribution we specified. And then you see that your data helped you to learn something about the parameter. So it's much more peaked in blue. And the most plausible log odds ratio is 0.2. And in shaded blue you see the 95% credible interval of most plausible values. And of course you can also look at the results more closely. If you print the objects, it will show you what type of meter analysis you fitted. So here it's a fixed effects model. It will show you the prior in words. It will show you base factors and it will show you posterior estimates. Let's start at the bottom with the posterior estimates because this is essentially what you see in the plot of the posterior. So you have a posterior mean. This is the point estimate, which is roughly 0.2 as in the figure. You have a posterior standard deviation, which is your uncertainty regarding the estimate. You have quantiles, which you can report as credible intervals. So there's a lower boundary of 0.06 and an upper boundary or alternatively you can use highest posterior density intervals. So either you report the HPD interval or you report the equitate interval. This is the estimate of the effect, but there's also a base factor. And the base factor tells you how much evidence is there for the effect. And this is essentially a quadratic matrix where you compare each model with each other model. And now the interesting value here is the 23.9 because this is the evidence that there is an effect. And more precisely, this is the ratio how likely are the data under the alternative hypothesis each month. So how likely are they assuming that there is an effect relative to how likely are the data assuming that there's no effect. And now we see the data are 23.9 times more likely when assuming that there's an effect. And this is essentially evidence for the effect. And this is clearly above the threshold. So there are common thresholds like 3 and 10 and we are clearly above it. If this base factor would be close to 1, you wouldn't know. You wouldn't have clear evidence. So this is the simple Bayesian analysis. You can do the same now with the random effects model. And what you need to do is you need to replace the function by meter random. Then most of the input is identical. And you have an additional argument tau for the prior distribution on the standard deviation of true effect sizes. So this is the heterogeneity. What do you expect? How much do the study effects differ in the population? Here it's a t-distribution with some location scale and degrees of freedom. Now if you fit this, it takes a few seconds longer because it does MCMC sampling. More precisely it uses STAN and precompiled STAN models. So it's very fast and all of this is done automatically. If you look at the forest plot, it's now slightly different because for each study we have two estimates. There's a circle which is the descriptive estimate and the triangle which is a shrunken estimate. And you see the shrinked estimates are closer to each other. This is a typical feature of random effects models and the overall effect. You can look at the posterior and as before the prior is updated to the posterior. It's more peaked. You have learned something about where the average effect size is. But if you look at the heterogeneity, you see that prior and posterior are very similar. So apparently the seven studies we have are not very informative with respect to telling us what the standard deviation of true effect sizes is. It's just difficult to estimate so the posterior is relatively broad. If you look at the results, you again get posterior estimates in terms of point estimates and credible intervals and base factors. And now you see the evidence for the effect is only 3.8. So there's less evidence that there's an effect when you assume that effects might vary. Now the question is of course which of the two models is better. And this is no answer by model averaging. So there's a new function meter BMA. The input is all the same. And if you run this, it will fit both models. So let's look at the forest plot. And now you see three estimates at the bottom. Fixed effect estimates and random effect estimates. These are the two we saw before. And then there's an averaged estimate and this is kind of a compromise between the other two. And the compromise is based on the evidence for one of the models. It becomes even clearer if you look at the posterior because now you have one prior distribution and it's identical for all models. But there are two posteriors, one fixed and violent at one random in red. And you see the fixed one is more peaked and the random one is more broad. And the question is which of the two is better. And in between you find the averaged posterior in blue. And this is actually the average of the two densities and the ratio by which they are averaged depends on the evidence for fixed effects or the evidence for random effects. And so this is a nice feature because you take into account the uncertainty which of the two models is better. And if one of the models is clearly better, the average posterior will reflect that and just mimic it. In the output you correspondingly have three lines for the estimates. Average fixed random which corresponds to the plotted distributions. And if you go up you know also see model posterior probabilities. You see that there are four models considered fixed and random and null and alternative hypothesis. And each has a prior probability of 25%. After seeing the data you get posterior probabilities and you see the most likely model is a fixed effects alternative hypothesis. But there's considerable uncertainty because this is rather likely, but the random effects alternative is also relatively likely. And this is considered now by model averaging. On top you find base factors. There you can see that they are again the numbers we saw before. So if you use fixed effects models, the evidence for an effect is 23. If you use random effects models, the evidence is 3.7. And the question is what base factor do you get if you do a compromise? And this is done here by the inclusion base factor and you get this by comparing the alternative hypothesis fixed and random versus the null hypothesis. Fixed and random. And if you do that you get a compromise base factor and inclusion base factor of 9.8. And this is in between. And it takes into account the uncertainty whether it's fixed or random, but it gives you the evidence. Apparently there appears to be an effect of the treatment. So this is the main function here, what it's used for. Now last thing very fast. There's also a sequential plot. I will not run this because it takes a lot of time. I will just show it. And this is a nice thing in the Bayesian framework. You can look at the sequential evidence updating as studies come in. You see each study changes the base factor, the evidence, and then you can predict what will happen if you do an additional study. So here there are 20 studies simulated with a standard error of 0.2. And you will see what to expect. This was it. Thanks for your attention. I hope you will find this useful.