 I'm a PhD candidate at University of Amsterdam, and I would like to tell you about troubles based on meta-analysis, a way of combining multiple methods of publication-based adjustment and the Robo-R package that implements the methodology. First, I will tell you about an example that compares register application reports and meta-analysis. And I will use it to highlight the problems and challenges of adjusting for publication-based. I will tell you a bit more about different ways of adjusting for publication-based, how we can combine them with Bayesian model averaging, and lastly, I will tell you more about the package implementation and functionalities. So, the example, Quarman and colleagues 2020 published a paper that looked at 15 different meta-analysis and a register application report that tried to replicate the main study from each of the analysis. And under some other assumptions, you would expect that the register application report should provide the best possible estimate of the true effect size, and that the meta-analytic estimates of original studies should converge to it. The original meta-analysis were of different sizes, and they range from 15 to around 300 studies. And if you look at original effect size estimates based on the public studies, you can see a wide margin of effects. And since this is a local publication base, unsurprisingly, you see that the estimates from registered replication reports were much smaller than the original effects estimate based on meta-analysis. So, this large discrepancy is by many attributes of the publication base. And one way that you can use this example is to see how well what different publication base adjusted methods adjust for the publication base and provide estimates closer to the registered replication report estimates. So, publication base adjustment is a topic that has been here for many decades, and there have been different methods implemented and developed that try to adjust for it. I like to differentiate them in two different themes. One can be armatted that adjusts for relationship between standard errors and effect sizes, for example, trim film, bad piece, or indigenous kink. And the second groups are selection modes of P-values that try to adjust for publication as operating on P-values, for example, 3 and 4 PSM, A, K1, A, K2, P, or P-uniform. So, I will just go through some of the bad piece and 3 PSM. So, bad piece is a conditional meta-regression estimator that tries to adjust for relation effects size in standard errors or standard error squares. And the idea is if there is no relationship between effects and standard errors in that happens when there is no publication selection bias, then if you fit the meta-regression estimator, the intercept should correspond to the true effect size. However, if there is publication base selection, then you see a higher number of small studies with large standard errors and overestimate effect size estimate, and then fitting either bad model or piece model should provide a much better effect size estimate. The selection modes on the other hand extend the traditional random effect or mix or fix effect meta-regression model between parameter mu to heterogeneous parameter tau and publication base weights omega. Here, for example, we use a step weight function that specifies different publication base probabilities for different P-value intervals. Here, we can fix the relative publication probability of significant studies to 1 and we can estimate the relative publication probabilities of the marginal significant studies or non-significant studies. As a result, you obtain a different likelihood function. So, the F would be the unweighted likelihood function while the FW is the weight likelihood function that takes the different publication probabilities at different P-value intervals into account. Why this approach is interesting? If you look at the one million published set statistics on Medline, you can see a very similar shape that shows two very large discontinuities, maybe accidentally at alpha 0.05. So, if you use those two very popular methods and in the example of Quaran, we can see the original effects estimate in red versus the replication of our estimates in blue and then the black circles that are pet peace and black triangles that are 3PSM. And in some cases, all of them will provide the same estimate. In some cases, 3PSM is better in pet peace and in other cases, pet peace is better than 3PSM. The problem is, after I write, it's hard to tell which of the estimates is better. So, the question is, how should you base our inference, especially if the methods disagree on the conclusions? So, we argue that you shouldn't show, you shouldn't base the inference on a single model. Instead, you should use robust Bayesian method analysis and Bayesian model averaging to base the inference of multiple models simultaneously. So, instead of selecting a single model, you specify all of the models, you fit them, and you base your inference proportionately on how well the different models predicted the data. Then, you use Bayes vectors to quantify the evidence in favor of the presence or absence of either the effect, heterogeneity, or publication balance. You can use prior distribution to regularize the estimate and incorporate prior knowledge and use Bayesian evidence updating that's independent of the sampling plan. So, in an overview, the Bayesian model averaging works something like this. You have different hypotheses about the data that are represented by the different demons, and some of the demons specify the hypothesis. For example, this demon says, so the treatment works. So, alternative hypothesis is true, or there is no effect, the only hypothesis is true, and you have different assumptions about heterogeneity, the fixed and random effect models, or different assumptions about the presence or absence of the publication best. So, you specify all of those different hypotheses by the different models in your ensemble. You feed the models with the data, and the models that predicted the data best will grow, and their voice will be heard much more. So, if the model predicted the data well, you will base the inference much more often on it, much more strongly. So, how we split the ensemble, then, for example, when you obtain the model average estimate, you can look at a different components in a way that models specify the absence or presence of the effect. So, you can split the parameter problem equally across those two model pairs, then against equally across models, assuming presence or absence of the heterogeneity, and then the publication best. So, at the end, you end up with eight different model types that specifying all the possible complications or combinations of either the presence or absence of the effect, heterogeneity, and publication best. So, each of the model types ends up with the same parameter probability. But, as I said previously, there are different ways how to adjust for publication best. So, in our illustration of the demon, this one demon can represent it in many ways. So, what we do? Well, it just circles all the way down. We just specify more models that represent this one demon. Here, you can specify, for example, the pet model as one way of adjusting for publication best. The piece model are different weight functions. For example, one-sided selection on one-sided p-values, on the significant level, or selection on marginally significant and significant p-values on the two-sided p-values, and all different ways. So, if you look across all of the different publication best adjustment methods that we specify in robust patient meta-analysis, we use the pet and piece of publication adjustment models to adjust for the effect size and standard errors. And we specify six different weight functions that specify different assumptions about the different possible ways publication might operate on p-values. So, and all of those specifications cover approximately the pet piece, 3PSM, 4PSM, AK1 and AK2 models. So, if you look back to our example, we can see that in some cases all the methods provide still the same estimate. In other cases, the problem provides an estimate that's somewhere between the pet piece and 3PSM. And in different cases, we can obtain again estimate that somewhere in between, but it's not greater than one of those methods. That just signifies that we are still doing statistics, not magic, and we cannot provide the correct answers all the time. Nonetheless, in simulation studies that are linked at the end of the presentation, you can see that in majority of cases, Bayesian model averaging provides the best possible results. So, we, to make this methodology available to practitioners, we implement it in the RoboR package. And the package uses MC estimation with checks, using the run checks R package, and then computes a merge like that, better bridge something R package. And the most things that the RoboR package does is the model specification, some plotting summary function that I will show you in a second, and additional auxiliary stuff. So, RoboR package, you can use it to to specify the default ensemble. You just specify the effect size and standard error. So, for example, here on the infamous BEMS 2011 data set, you fit the model, just with a single simple call, and you can use a summary function to obtain some default summary of the model. Here, you can see that in the first summary table, you see information about the whole model ensemble, and you see that you specify 36 models, 18 of which assume presence of the effect, 18, presence of the heterogeneity, and 32, presence of the publication bias. The parameter robots are equal across the component, and you see the possible properties. You can also quantify the evidence for the question-based factors. And you see that there is very big evidence for the absence of the effect, moderate evidence for the absence of heterogeneity, and strong evidence for the presence of publication bias. And then, of course, you see the model average estimates for the mean and heterogeneity parameter, and then the publication bias, or relative publication bias probabilities, and patent piece estimates. So, moreover, the package provides additional summaries. For example, you can look at the summary of initial models that shows you the prior distribution for the effect, differentiating the publication bias, the primary properties of each of the individual models, marginal equals, posterior probabilities, and inclusion-based factors for each of the models. Also look at the MCMC diagnostics of the individual models that show a summary of the MCMC error, minimum effect of sample size, and maximum overhead from each of the models to verify that the models are fitted properly. And you can also get estimates from the individual models. So, here I'm just showing a print of the last two models that are specified, and you can see the model specification, the parameter estimates, etc. So, even if you don't want to do divisional averaging, and you want to look at the individual models, you can look at the estimates from the individual models. Then the package also provides spotting functions. So, for example, you can plot the model average mean estimate, where the spikes correspond to the probability of models assuming absence of the effect. So, the effect size is zero, and then the slap corresponds to the density of models assuming presence of the effect. The functions are also implanted in ggplot. So, if you are a fan of gplot, you can use those, or you can also look at the prior and posterior distribution. For example, here for the tau estimate, assuming presence of the effect, and many other combinations. The package allows multiple different specifications, and you can modify basically everything about the ensemble. For example, you can change the prior distribution for the effect and specify truncated normal distribution that specifies hypothesis of small effect sizes. It means zero standard deviation of 0.3, forced to the interval of zero to infinity. Or you can specify different ways of adjusting for publication, whereas here you can specify only prior distribution that specify the path and piece of models that adjust for publication. If you want to see more different specifications and customization, I recommend to check the variance of the package that are on CRM. You also implemented the R package in GES with the graphic user interface. And the GES implementation allows you to set basically all of those customization also in GES, specify the models for different prior distributions. And then you can create different summaries for inferences, different figures. Here you can see again the model average mean effect size estimate, and also the model average weight function estimate across all selection models. So just to sum up something about the Roost-Basement analysis, it can incorporate uncertainty about the selected model with Bayesian model averaging. So you don't have to base inference on any single publication less adjustment model, but on all of them and how well they predict the data. It can provide evidence for either not or alternative hypothesis. It has better preference if it's more sample sizes. It has the capacity to incorporate expert knowledge and has the potential for sequential reading of evidence. On the other side, there are some disadvantages. For example, it's low, it requires MCMC sampling, and it can also fail under strong P hacking. So thank you for your attention. I hope you enjoyed the talk. If you want to learn more about the package, you can either look at the CRN, where the package is released, or my GitHub page, where you can also submit a feature request or bug reports, and look at GESP. There are some references to the papers that we have written that outlined the methodology in more detail and tell more about the model specification or simulation studies that we further conducted to verify the methodology. Thank you very much. And looking forward to see you in discussion.