 Hello and welcome back to EsmerKonf 2023. This is the fifth presentation session of EsmerKonf focusing on quantitative synthesis part 2. As always you can ask questions via Twitter by following at ES Hackathon or by using Slack if you've registered for the conference. Presenters will be online during this session and after to answer your questions. So do keep those questions coming in. So our first session today is by Daniel Heck who's going to be introducing MetaBMA Bayesian Model Averaging for Metronalysis in R. Over to you Daniel. Yeah, hello, my name is Daniel Heck from the University of Marburg and today I will present the R-Package MetaBMA which can be used to perform Bayesian Model Averaging for meta-analysis and the example here concerns the effectiveness of descriptive social norms and more specifically the question how we can make hotel guests reuse their towels and the control group the hotel guests were simply told please reuse your towels because it's good for the environment and the treatment group they were told that actually a majority of guests reuses their towels and the effect size here in seven studies was quantified by log odds ratios on the x-axis and positive values mean that the treatment was effective so that actually the descriptive social norms increased the reuse of towels and the question we have now is how can we accurate the results here from the seven studies and in meta-analysis there are two standard models which are commonly used the fixed effects model assumes that each study has the same constant true effect size mu and That this effect size is identical in all studies the random effects model in contrast assumes that each study has a different effect theta i which varies according to a normal distribution across studies so first the studies are sampled essentially from the normal distribution as shown on the right-hand side and the parameters of this distribution are mu the average effect size and Tau the standard deviation of effect sizes and this tau is also known as the heterogeneity It's amount of differences of effect sizes across studies Now the question of interest are two fold First we are interested in the question whether there is an effect So whether the overall effect mu differs from zero and the second question concerns heterogeneity Specifically is the heterogeneity. So it's tau larger than zero and if you combine these two aspects into models You get four different models which are denoted here by an H for hypothesis H zero is the null hypothesis and H one Alternative and the F is for fixed and there are for random And then you get the four models where you assume either the effect is zero or not and second The heterogeneity is zero or not and the question is obviously which model should we pick? If we select one of the four models we ignore uncertainty Which of the four models is the correct one? And so it might be better to actually consider the uncertainty, which is the best model and Exactly this is done by a Bayesian model averaging So first you have to define prior model probabilities, which is the question How plausible is each of the four models before you see the data and here you seen illustration where they are uniform priors use so on the top there these four boxes for the four hypothesis and Each has a probability of 25% What follows from this is on the bottom on the left and right hand side and the plausibility of our questions First is an effect. Hmm. If you combine the null models and the alternative models, you see it's 50 50 We don't know yet. This is why we do the meter analysis on the right hand side You see the heterogeneity and there you also combine models But on the left hand side the fixed ones and the right hand side the random ones and here it's also 50 50 Now if you collect data the Probabilities are updated. So we now have posterior model probabilities and these are determined essentially by the data We collect these inform the estimation by the prior model model probabilities And as we will see later by the prior distributions on the parameters mu and tau Now what happens is here that the models under the alternative become more plausible They have larger posterior probabilities as indicated by probabilities of 40 percent and 735 percent And if you now ask is that an effect then their combined probability is 75 percent Compared to only 25 percent for the null models. This indicates some evidence that there might be an effect While you consider uncertainty Whether effects vary across studies and the same can be done on the right hand side for heterogeneity and This is what the inclusion base factor quantifies. How much evidence is there for a non-zero effect? So whether mu is different from zero and this is done by comparing two models the alternative models against two null models And thereby we account for the uncertainty Should we assume fixed or random effects and the same you can do for the heterogeneity just that you bundle together different models So here you compare the random effects models against the two fixed effects models however this only works in a Bayesian framework if you assume prior distributions and Specifically this means you need to make assumptions. What size of effect do you expect and how much heterogeneity do you expect and Here these prior distributions are illustrated for the fixed effects models So you see on the right hand side two plots where there's just a point mass at zero So essentially the fixed effects models assume there is no heterogeneity. It's zero on the left hand side It differs between the null and the alternative model whether you assume an effect the null model assumes no effect And the alternative assumes that the effect is likely between minus two and two with higher probabilities assigned to smaller effect sizes And now I switch to the random effects model And you see on the right hand side now a prior distribution on the heterogeneity tau on the standard deviation of effect sizes and This year's the default prior used and based on a literature review Now what are the basic functions of the package you can specify priors you can fit models you can average Across the models and you can compute these inclusion base factors. I mentioned before these are essentially the components I mentioned already and all of these four steps are combined in a single function called meter BMA and Here see how this is called Essentially you have four lines where you specify the input This is a standard input for meter analysis as you would also do in other packages And then you have to specify a prior for the overall effect size here And here you see it's a normal distribution with mean zero and standard deviation point three truncated at zero and On the bottom you see the prior for the heterogeneity The output is rather large. There's a separate tutorial video that explains all of this But essentially you get base factors you get model averaged and results posterior probabilities and estimates for the parameters The plotting functions are very helpful because they show what happens here Essentially you have a posterior for the fixed effects model, which is violet a posterior for the random effects model Which is dotted and red and the model averaged one is in between It's a blue one which is kind of a combination of these two and how far they are combined depends on the posterior probabilities of the models Here you see also a forward plot where you have the descriptive effects as circles and as triangles the shrinked Estimates for the studies and you see that there's quite some shrinkage going on so the model Estimates that the effects are rather similar even though descriptively the effects vary much more on the bottom You also see the three estimates fixed effects random effects and the average one, which is a combination of both So what are the benefits of this approach? I think it's very nice because it's principled the assumptions are very transparent and we consider the uncertainty regarding auxiliary assumptions and You can do it sequentially so you can continuously update your beliefs as more studies are published This is shown here so on the x-axis you accumulate studies You can update your meter analysis and this is what usually happens is that more and more studies come in However, it could be criticized that we consider the fixed effects model at all However, this is a plausible model and for instance if you perform direct replications There's some evidence that this model is supported and only also if you only have few studies You might not have sufficient evidence to for heterogeneity It might be better to assume that the effect is Identical to get a higher predictive accuracy, which means on the bias variance trade-off That you have less variance in predictions And this is beneficial compared to fitting a complex random effects model and of course there are priors And the question is how to pick these you could ask experts with prior elicitation You could do literature reviews and of course you should perform sensitivity analysis to make sure that The results are robust for different priors and of course it's possible to pre-register the priors So thanks for your attention. Thanks to my collaborators at the University of Amsterdam And if you're interested in these topics you can look up four main papers at my web page One of the first concerns the meta packet meta BMA package itself. The second is a primer on the methodology There's a tutorial on chess but you can use these methods in a point-and-click interface And also we applied these methods to a set of pre-registered pre-registered replications. So thanks a lot and see you around Thanks so much Daniel up next we have Jens von derich who's going to be introducing Metapype X data analysis and harmonization for multi lab replications of experimental designs Yeah, thanks for joining us today everybody I'm going to present the metapype X framework today Which is for data analysis and harmonization for multi lab replications of experimental designs This will be more of a brief introduction how we arrived at the framework what it is and how we'd like to see being used and For a bit more details on how to actually use it please Feel free to attend our tutorial on Friday at this year's as my content as well So where did it start? we were going to reanalyze data from large-scale direct replication projects like the many labs or the multi lab registered replication reports and We intended to use participant or item level data And we needed that data to be cleaned in order to compare reanalysis with already published results That turned out to be much more difficult than we anticipated But before I'm going to talk about the difficulties I just want to get some vocabulary out of the way to make sure that we're talking about the same things So this graphic here represents the basic structure of multi labs and Say multi labs. I just as I just said mean like the multi lab registered replication reports on many labs And all of these have a very similar structure on some level And the first interpretable data would usually be item level data where each participant is a row and each item is so represented as a column and For individual participant data these rows are still participants But now the dependent variable has been aggregated to a single numerical value And these again are aggregated to achieve replication level statistics like between group effect sizes or group variances and then those replication statistics are used to run meta-analysis to aggregate data for each replication project and Finally a multi lab then might Might span multiple of these replication projects Yeah, but despite the very similar structure and the solutions they found To document and analyze their data were rather unique. So just a few examples here The Software and use was usually are but sometimes as PSS or other software the code structure itself was very unique to each multi lab so When or by who is data aggregated sometimes these steps were taken by the replication sites and we don't find the actual raw data And sometimes that was done at the replication project level Yeah, and the data files themselves are very different sometimes CSV files as PSS files exo files are data, etc And if you include the files with information for the data transformations, so like information that and the code sources the analysis or data transformation code sources You will also find solutions like text files or Google documents And probably some more Um Yeah, so the variation across the multi labs is quite large There is little consistency in naming conventions sometimes the verbal descriptions and other provided details On the data transformations are sparse or even inconsistent Some of the code solutions are really complex though interesting And sometimes we do not find cleaned datasets at different levels of aggregation So this just results in a lot of detective work for anyone who wants to use that data and Makes computation and reproducibility much harder to achieve and actually check Yeah, and I want to Elaborate a bit more about computation reproducibility But that we mean you can run the original code on the original data and achieve the same results as reported by the projects And there were a few great analyses on registered reports and publications with open data badges over the last couple of years But the results are somewhat devastating to be honest Which maybe just means it's it may be a lot harder to get there than we think about than we think So some of the results were just only about 30 percent of the articles these Projects looked at were actually reproducible The computation reproducibility depends on the skill level of the analyst, which then again Posed the question like how much skill is necessary until it's not computational reproducible anymore Most of these analysis agreed that using non proprietary formats helps a lot and providing version control So that could be using solutions like our end but also using containers and Like Docker for example, and also just simple stuff like using relative locations. And so You have to change less Less file paths just when you're trying to rerun it And some of these were already implemented Some of these best practices were already implemented in some of the multi labs So for example many labs to found a package a solution for their data transformations You know, I think the the package at the moment could be pulled. I'm sure if it's available right now There were container solutions a lot of non prior non proprietary software was used And also bug trackers, which are great just to make sure that the community interacting with those projects is visible and Just so we know where people have found errors or deviations Yeah, but this is very inconsistent across these projects So so our solution to these complications is Meta pipe X and just to break down the name quickly is a pipeline from meta analysis of experimental data And these make up more than 50% of the currently published director application multi labs Yes, the framework consists of three components We see on the left the standardized analysis pipeline, which provides guidance to make the analytical structure more explicit And reduces documentation effort. We also created an R package that matches this pipeline So it analyzes data it creates standardized documentation for the data at different levels of aggregation and the third component of the framework is a shiny app, which we can explore data Which we can use to explore data Meta pipe X data, that's a combination of replication results and meta analytical results and basically is just a GI to Select and visualize multi lab analyses. So You can also run the analysis within the shiny app and Combine different data formats. It takes SPSS data R data and CSV files So you don't have to be able to work with R to implement the pipeline. You can just use this GUI and the most comfortable version to apply the pipeline is the full pipeline function Which just takes individual participant data as an input. So it's depicted on the left and Yeah, then just runs the analysis and creates a full documentation of the standardized pipeline On the right, you see the full structure that is exported by the function and in R This is also provided to you as a nested list With each folder, you always have the data file and According code books, so you can make sense of the columns in the data set And finally what we hope meta pipe X may do We hope it will reduce effort in analysis and documentation We hope it makes your multi lab data more explorable. So no matter if it's your own data are you blind or simulated data? And we hope this fosters interaction between the research community and these types of project Best case scenario it helps to make the multi lab format more accessible to primary researchers and students alike And it helps to develop and ask more meta scientific questions chat more light on questions around heterogeneity Yeah, and that just adds to the meta scientific toolkit Yeah, that's it from us or for me today, thank you for your attention and please feel free to Join our as my com tutorial on Friday Thanks very much Jens next up today We have James Pustiovsky and mega Joshi talking about clustered bootstrapping for handling dependent effect sizes in meta analysis Exploratory application for publication bias analysis over to you James and mega Hi Thanks for joining us for Esmar come 2023. My name is James Pustiovsky and with my colleague mega Joshi We'd like to share one piece of one of our ongoing projects Developing new tools for investigating selective reporting in meta analysis of dependent effect sizes The method that we'll be demonstrating today is a cluster level bootstrap for a vavae hedges type selection model Now by selective reporting we mean the phenomenon where statistically significant Affirmative results are more likely to be reported and therefore more likely to be available for meta analysis Compared to results that aren't statistically significant or aren't consistent with theoretical expectations So this happens as a result of biases in the publication process on the part of journals editors and reviewers As well as because of strategic decisions on the part of authors selective reporting is a big deal It's a major concern for research synthesis because it distorts the evidence base available for meta analysis Kind of like a funhouse mirror distorts your appearance It leads to upward biases and estimates of average effect sizes and complex biases and estimates of heterogeneity All of which makes it all the more difficult to draw conclusions from a synthesis Now a meta analyst might say we've already got tons of tools available for investigating selective reporting Why do we need more? We've got graphical diagnostics like funnel plots tests and adjustment methods like pet peas Selection models p-value diagnostics like p-curve and p-uniform The problem is very few of these methods have been extended to handle dependent effect sizes Which are a really common feature of meta analytic data Dependent effect sizes crop up all over the place For instance when primary studies report results on multiple measures of an outcome construct or measure effects at multiple time points or involve Multiple treatment groups compared to a common control group and they also come up in meta analyses of Correlational effect sizes where you may draw more than one correlation coefficient from based on the same sample If you've done meta analysis work in education psychology or other social science fields You probably recognize that dependent effect sizes are really really common Although we have good methods available for handling this sort of dependency when conducting summary meta analyses or meta regressions There are very few methods available for investigating selective reporting that can accommodate dependent effect sizes And what's more if you use existing tools that don't account for dependency You can get misleading results like two narrow confidence intervals and hypothesis tests that have inflated type one error rates So we want to explore a rough and ready pragmatic strategy for investigating selective reporting while also dealing with dependent effect sizes Our thought is to fit a regular selection model as implemented in the metaphor package And then use a cluster level bootstrap to account for the dependency and this uses standard methods that are implemented in the boot package For demonstrating this method We'll use data from a recent meta analysis by layman and colleagues that looked at the effect of the color red on Attractive judgments the data include 81 effect sizes from 41 studies And so we've got effect size dependency issues to deal with Here's a funnel plot of the data a basic random effects meta analysis indicates an average effect of about point two standard deviations and Substantial heterogeneity of about point three two the funnel plot definitely has some asymmetry to it So you might well be concerned about selective reporting bias with these data Hi, I'm mega and I'm going to go over how to cluster bootstrap selection models To implement a cluster bootstrap using the boot package We need a function to fit the selection model which takes in a data set with one row per cluster The function also has to have an index argument, which is a vector of row indexes used to create the bootstrap sample and Then it can include any further arguments Our data set has one row per effect size and potentially multiple rows per cluster There are at least two ways to turn this into a data set with one row per cluster We can use the group by and summarize functions to create a data set with just cluster level IDs And then we can merge it with the full data by study to get the effect size level data Alternatively, we can use the nest by function to nest this nest the data by cluster And then we can use the on nest function to recover the effect size level data Here's a function to run a selection model Inside the function we first fit an RMA-UNI meta regression model We then use the cell model function from metaphor to fit a selection model We don't need standard errors to be calculated in this step. So we're skipping those calculations here Which speaks things up We then compile parameter estimates as single vector Further, we use the possibly function from purr to handle errors By passing run cell model through possibly it will now spit out NA in case there are any convergence issues Or any errors when running the selection model Here is the completed fitting function called fit cell model First we take a subset of the data based on the index argument This generates a subset of the data based on re-sampled clusters We use the on nest function to get the effect size level data for those re-sampled clusters We then run the run cell model function here And we include the run cell model function inside the big function here to ease parallel processing The dot dot dot here refers to the contents of the run cell model function in the previous slide With our example Lehmann data set we create a nested data with one row per study And fit the selection model using fit cell model These are the point estimates from the three parameters selection model Now we can bootstrap using the boot function which takes in the nested data set And the function to fit the selection model Arguments for steps Number of bootstrap replications and any further options for parallel processing Parallel processing is really useful here because we get 2000 bootstraps in under one minute To get bootstrap conference intervals we can use the bootci function Specify the type of conference intervals and specify the index of the parameter that we want Here the index of one is for the overall average effect size And we get the conference intervals for the overall average effect Based on the selection model the overall average effect has gone down a bit From point two to point one three And the CI indicates that there's quite a bit of uncertainty around it Here we get the conference intervals for between study heterogeneity And for the selection rate So this has been a very quick demonstration of cluster bootstrapping for Selection cluster bootstrapping and selection model This cluster bootstrapping technique is interesting because we could in principle apply cluster Bootstrap to other models or methods to investigate selective reporting We are currently studying the performance of bootstrapping a three-parameter selection model And initial results suggest that the conference intervals like the one we showed you Have reasonable coverage Future directions include exploring other resampling methods such as fractional weight and bootstrap And turning the workflow that we presented here into a more user-friendly function Thank you so much for your attention And please feel free to reach out with any questions Thank you Thanks very much James and Mega Next we have Matt Lloyd-Jones who's going to be talking about a Proportionate response to comparing proportional outcomes in metronalysis Over to you Matt Hi my name is Matt Lloyd-Jones and today I'll be talking to you about some ways of dealing with Proportion data in metronalysis So what is proportion data? Well proportion data is simply data that is presented as parts of a whole So it could be proportions percentages or prevalences They're limited to 0 to 1 or 0 to 100 percent And some of the research questions you might have relating to proportion data are demonstrated here So my one was how does the proportion of resistant bacteria in the fecal microbiome Differ between antibiotic treated and untreated cattle How does proportion data typically manifest in the scientific literature? Well typically we don't have the raw data and we're limited by what studies present Typically studies only present proportion data in their figures Some studies won't even present that proportion data directly and they'll Present figures of the numerator and denominated data from which we have to calculate proportions And then alongside that a few studies after a long hard battle will provide raw data And what so we're left with these different levels of detail in terms of data that we have to combine into a metronalysis Okay, so with all that in mind I'm going to be taking you through three problems that I came across when dealing with proportion data in my meta-analysis And the proposed solutions that I implemented in the hope that they can help you I'm going to be highlighting the r packages and functions I used in blue on my slide So keep a pen handy for writing those down if you're trying to do something similar So perhaps the most obvious problem with proportion data is that they are restricted to a 0 to 1 or 0 to 100 Range which means they don't meet the assumptions of normality So given that proportion data do not immediately meet the assumptions of normality If we're going to use parametric tests for meta-analysis Then we need to do something to the data In order to deal with this and one common way of dealing with this is to transform the data To get it into a format which more closely approximates a normal distribution And there are many options for transformation Including log log it arc sign and double arc sign transformations And there's a lot of chatter about the relative merits of these different types and it's often confusing To the meta-analysis which type is the best to use in their case So which one did I use? Well, I used the arc sign transformation And I implemented this transformation using the transfer arc sign function of the metaphor package So the arc sign transformation has benefits over the log or log it transformation in terms of dealing with Sampling variances better and also over the double arc sign transformation in terms of back transformation and Interpretability to be fair the arc sign transformation has its own criticisms So one of the criticisms is around its interpretability. So if you use this in a conventional modeling context You can get predictions over or outside of the naught to one range But I argue that actually when you're using this in a meta-analysis context What you actually do is you generate effect sizes first which you then put into your models And so implicitly deals with this problem And actually if you go back to some of cohen's early work He provides even guidance on how to transfer all the effect sizes back into approximate differences in proportions between your control and treatment groups So I argue that that's less an issue in meta-analysis The other argument is that GLMMs accounting for binomial distribution are better Dealing with proportion data But this in a meta-analytical context is perhaps less relevant because as I Demonstrated earlier, you're often you don't have The underlying Numerator and denominator data. You just have proportions presented. So you can't even use these methods And there are other issues with these methods in terms of dealing And model fitting with sparse data Which we often have problems with in meta-analysis anyway So I think the arc's transformation is a good pragmatic choice in terms of dealing with proportion data So problem two is even before you've got to the question of how to deal with your proportion data in modeling You may not have your proportions directly presented in the paper That you're trying to incorporate into your meta-analysis So while some studies will only report proportions You'll come across other studies that don't report proportions at all And they actually just report the numerator and denominator variables that would make up that proportion And in these cases almost always you've only got some measure of the mean And variance of those proportions rather than individual data points And this makes calculating that a proportional mean of variance quite challenging So how do you calculate the mean of variance of the proportion based on this data alone? Well, the only way of dealing with this problem that I can see Is to simulate the underlying data based on those means and variances of the numerator and denominator that you do have And then to divide them by each other in order to estimate the mean and variance of the proportion And this is fairly easy to implement using base functions in R You can plug in your means and standard deviations of your numerator and denominator variables And and simulate vectors of simulated data And then what you can do with those is you divide one by the other to calculate a simulated vector of proportions And then from that you can calculate a simulated mean and variance of a proportion Of course, this is not ideal But then most things in match analysis are not and you're kind of bound to getting everything into this proportion format By the fact that many studies only report things in proportions as I mentioned earlier With this kind of thing I'd recommend a sensitivity analysis based on removing the simulated data Just to check that it's not having a disproportionate influence on your overall results And the final problem that I want to highlight is the lack of interpretability Of proportional differences in metric analysis So by their very nature proportions obscure what is happening in the underpinning numerator and denominator data So the proportion of your numerator can change either because it's absolute value changes Or because the proportion of the corresponding denominator changes So for example, the proportion of resistant bacteria in the population May increase because the resistant bacteria grow and increase in number Or alternatively the proportion may increase because the absolute value Of the denominator decreases and these obviously have very different implications in terms of Interpreting what's going on in our data And I actually think this is a bigger interpretability problem than some of those I mentioned earlier And the way I propose to deal with it is to use absolute count data where it is available alongside Your proportion data on which your main analysis is built So you have your effect sizes based on your proportion data which cover all of your studies and you've plotted and modeled those But I think you should also alongside those plot and model the absolute counts where they are available And this can give you a sense of whether what you're seeing is an actual increase In your numerator or associated with a change in the denominator So for example, if we plot effect sizes based on the Absolute number of resistant bacteria alongside those based on the proportions We can see if the resistant bacteria are growing and increasing in number Or it's just the fact that your susceptible bacteria are being wiped out and this can really aid the interpretation Of your meta analysis, which can be very abstract as we know So in summary, we often have to deal with outcomes reported as proportions whether we like it or not I proposed some potential pragmatic solutions for dealing With this data in the form in which we often encounter it when we're doing meta analysis But of course the real problem here is that people don't publish their data now allowing us to Fully make use of it and the real solution of course is for journals universities and funders to enforce and or reward data sharing So we can do more rich analysis So finally, I'd like to thank the whole evidence synthesis team that was involved Not just in the meta analysis, but the wider systematic review project In terms of the meta analysis, I'd like to thank Alfredo Sanchez Toha who was my Main meta analytics will collaborate. I'd also like to thank Shinichi Nakagawa who provided some useful email correspondence on the transformations part of this And Dan Padfield my colleague who came across similar problems outside of meta analysis and sort of influenced my thinking on this a bit Finally, here's a brief bibliography slide in case you want to follow up on any of the resources I presented. Thanks for listening Many thanks indeed. That was a really interesting talk So that closes off this session for today. That's it for quantitative synthesis part two We hope that you've enjoyed this session as always keep the questions coming in via twitter by following us at es hackathon And we'll try and get our presenters to answer those over the next few hours or days So do please keep those questions coming in. Enjoy the rest of the conference and see you again soon