 Hi everyone, I'm Stan Lazak and welcome to this talk on Bayesian evidence synthesis using the base combo R package. I won't talk much about Bayesian methods, and we'll assume you are familiar with the idea of a prior distribution and a posterior distribution. Let me start by motivating what this package does and the problem that it solves with a particular problem. So assume we have a hypothesis, which is, is a new drug effective for depression? So perhaps we've developed this drug and we're interested in trying it out now in clinical trials. So we have one study, which is a two-group between subjects design. So this is a randomized control trial where some patients receive a placebo drug and some receive our new drug. The drug is dosed at 15 milligrams and our patient population consists of moderately depressed patients. And then our outcome variable is the back depression inventory. Then for our second study, it's also a two-group study, but it's a crossover design. So this means that patients receive both the placebo for a period of time and the drug and halfway through the study, they switch over, half receive the placebo first and then the drug, the other half receive the drug first and then after a wash up period, get the placebo. So this is a within subjects design where study one was a between subjects design. For study two, they're dosed at 80 milligrams, so higher than the first study. And this is in severely depressed patients. So it's a different patient population as well. The outcome is heart rate variability, which is a physiological biomarker of depression. And people that are depressed tend to have a lower heart rate variability compared to healthy controls. And of course, we can imagine these studies differing on other dimensions as well. Perhaps study one was conducted in the UK and study two was conducted in Japan. And we want to combine the results of these studies to test our hypotheses, but they really are quite different and that's the problem. So the parameters, which are the effect sizes that we calculate in these studies, it's hard to argue that they're really measuring the same thing or the same underlying effect, which is what a standard meta-analysis would assume. And there's always some debate in the meta-analysis literature with how similar the effect sizes or inputs have to be for it to be a viable approach. So here I've used this fictitious example to make things very different, where I think most people would agree that coming up with a common parameter and assuming that it's sort of consistent across the studies, even if the random effects of that meta-analysis is not appropriate. So what can we do? One solution is based on this paper that came out in 2012. And the idea is to combine the results at the level of the hypothesis and not at the level of the parameter, like a normal meta-analysis would do. And this paper did have some R code with it, but it wasn't very user-friendly. So myself and a colleague rewrote all of the code from scratch. And I made it into nice functions and into an R package that has an easier user interface, so we could try out this approach much more easily. But all of the intellectual work in terms of the methodology came from this paper. So the way it works is we first start with some hypotheses, a set of mutually exclusive and exhaustive hypotheses, and put a prior probability distribution over them. So our hypotheses are only three. So the first one is that the effect size is less than zero. This is h less than zero here, that the effect size is exactly zero, or that the effect size is greater than zero. And we could, of course, have a different distribution over what we think is initially plausible, but this is a good approach to start, especially as we're accumulating multiple studies over time. This is a good default setting. Then the second step is to come up with a prior for our parameter. And this is a little bit confusing because we have two priors here. So this is a prior for our effect size. I'll use theta throughout to refer to the effect size. So we have a prior over that in addition to a prior over a hypothesis. And what this is saying, we're modeling our prior as a normal distribution, and this is saying that effect size is less than minus three or greater than positive three are very unlikely. Then we take our data and analyze it, and what we get from that is a likelihood. And this is independent of a prior, so nothing amazing about this. But what it does tell us is, given the data, what values of the parameter are effect size are more plausible and which ones are less plausible. So we can see here that the distribution is shifted to the right towards positive values, and large negative values are very unlikely given the data that we've seen. Then the next step is to combine the likelihood and the prior to calculate a base factor. And if you're not familiar with the base factor, you can think of it as a likelihood ratio. It's very similar to that. And what it tells us is which hypotheses are better supported by the data. And we can see here that given our likelihood tells us that the data has said that positive values are more plausible, we can see that reflected here in the base factor where our hypothesis of a positive effect size, or one that has positive values, becomes much higher compared to other hypotheses that assume an effect size of zero or a negative effect size. Base factors, it takes them getting used to and they're not really intuitive for many people. So we can convert this back to a probability scale by combining what we calculated from the base factor and our prior probability over the hypotheses. And finally what we get then, and this is sort of the end of the analysis for a single study, is a posterior probability over our hypotheses. And we can see here that the hypothesis of a positive effect size is much more supported compared to the other hypotheses. And the y-axis is now on the probability scale. So it's about 80% or a probability of point A that given the data, our effect size is greater than zero. Now, as I said, this is for one study. And to now accumulate evidence or combine evidence across multiple studies, we just essentially repeat the same procedure where we come up with a new likelihood based on each dataset. The only difference is when we go around for the second time, we don't start off with this initial hypothesis of being equally likely. We take our posterior from the first analysis and this becomes then the prior for the second analysis and so on then. Once we do that second round through the data, our posterior for that becomes the prior for the third round and so on. And you just keep iterating until you analyze all of your datasets. And then ideally at the end you'll have one hypothesis that clearly dominates the evidence. So I put this slide here just to get some intuition of how that base factor is calculated. And it's fairly straightforward. So the distribution at the top is the prior density, which I had on the previous slide. And I've just divided it into regions of negative values. This is region A, a region of positive values, C, and then the height of the curve. So this isn't a region, it's just a point at zero. And then once we've analyzed the data and combined our prior for likelihood, we get this posterior density. And because our data said that the values were likely to be positive, we see this shifted a bit towards the right. So the area F here gets larger and area D gets smaller. And then to calculate our base factors, it's just taking ratios of areas here. So for example, if you wanted to calculate the base factor for the hypothesis that the effect size is greater than zero, it's just the area of region F over the area of region C. And then same thing for the first hypothesis here, that's the area of D over A. And then the slight twist is calculating the base factor for the hypothesis of the effect size being exactly zero. So this isn't an area, it's the height of the curve at point E here, and then the height of the curve over point B. That's that ratio that we use for that calculation. So the math itself is fairly straightforward and simple to calculate these once you have these distributions. So let's turn to some data now. So this is fictitious data. The first study has the Beck depression inventory on the y-axis. And each point is one patient. Lower scores are better here. And we can see that those in the drug condition on average have slightly lower values. So it's adjusted that there might be an effect. For study two, it's heart rate variability plotted on the y-axis and higher values are better. And this is plotted differently because you may recall this is a crossover design. So a patient was both off the drug, so in the placebo, but then also on the drug at another time point as well. And most of these lines are sloping upwards. So again, suggesting that it's there's improvement when they're on the drug versus off the drug. So the first step in this whole process is to just analyze those two data sets with a regular analysis. And we can do that with the LM function. So for the first study, we're just modeling the back depression inventory value as a function of drug, which is just an indicator variable. And we get an estimate for the drug effect minus 0.815 and a standard error. And it's these two values that we need for the Bayesian evidence combination. So we can just take these values and plug them into another equation. The p-value here isn't significant, but it's tending towards sort of the smaller side. For the second study, we're modeling heart rate variability. We're including a subject effect now to come for the paired nature of the data, and then of course, including the drug effect as well. This is a simple model. Likely we would want to use a slightly more complex model for a crossover design, perhaps having a subject as a random effect and so on. But just for the illustrative purposes, I've kept it very simple. And here we get another estimate of the drug effect and also the standard error for that. And this p-value does happen to be just barely significant. So these four are blue data points that we then take forward into further analysis. So now we turn to functionality in the Bayes combo package. And the first function is PPH, so posterior probability for a hypothesis. And we just feed in our effect size estimate that we calculated and then the standard error for that as well. One slight twist is that the effect size was negative that we actually calculated because being on the drug was associated with a better outcome. And one thing that I did here, which is common also in the meta-analysis is that you want all of your effects that is a good effect going in the same direction. So here I just dropped the minus sign. So we assume that positive value means a good effect of the drug. And we run that analysis and then we can plot the results. And here this is our posterior probability of our hypotheses, the end of the first loop through that cycle where we can see that the probability of a positive effect size is about 75% or so if we read it off the graph. So there's some evidence that the drug works, but not super strong. And we can also plot the results. So there's a plot function defined for the outcome of that. And that gives you this graph here where you have a plot of the prior and gray. And this is the default prior that's chosen by that function. Then we also have the likelihood which is the data, the dotted line here. And that's purely driven by the data. And then finally we get the posterior distribution as the black line, which is a compromise between those two. And this posterior, we can extract the mean and the standard deviation of that, which are within this results one object. And you can get those values if you need them. So we can do the same analysis for study two, just feed in the estimated effect size and standard error. And there are a bunch of other options here for this function, which for now I'm not discussing. And again, you can plot the results of that. And not surprising that probably that the effect size is positive has the most support, maybe about 90% or so we get for that hypothesis. So that's great. This is analyzing one study at a time, but we want to analyze them all together and sort of accumulate all of these effects. So the easiest way to do that is use the evidence combination function, where we just feed in those estimated effect sizes and the standard errors as vectors. And we can then plot the result. And here we have, again, the probability of the hypothesis on the y-axis. And now the x-axis is the number of studies. So we start off at study zero. This is our prior probability where all of the hypotheses are equal. And we can see that after the first study, the hypothesis of a positive effect size here in red is increased. And then after the second study, it's further increased as well. And then the hypotheses of the other studies decrease. And of course, if you had more studies, you know, you would just have to further to write here. And ideally, one line would clearly dominate the others. And you'd have a very strong support for one hypothesis or another. Another function that you can use is the force plot. You just call it on the result of that analysis. And here the effect size is on the x-axis. The gray bar is the prior distribution. And the black point here is our estimated effect size that we put into the model, as well as the uncertainty based on the standard error. So it gives you a feel for how the prior is relating to the effect sizes. And there is also an option to scale the effect sizes. So if what you're inputting is on very different measurement units, it might be hard to visualize it with a force plot so you can standardize it. So they're more comparable. Now with a Bayesian analysis, one of the big questions is that the prior itself and how do you specify a good value for that? There is an argument here where you can specify a prior for each study just by another vector of arguments. But what's easier to do sometimes is to use this standard error multiplier, where it takes the default priors, which are calculated here based on a rule of thumb that was in that publication, and just say, well, I just want to double it, multiply it by 2. And then you can see the effect of that in the graph here, where the gray prior distributions are now choices-wide as they were in the previous graph. And this is a good way to perform a sensitivity analysis, where you can double this or quadruple it or half it and see how the results changed based on your prior. One other option that you can consider using is instead of specifying a point-and-all hypothesis, so that's what we've been doing in the past, and we have the one hypothesis of being exactly zero, you can have an interval or a range hypothesis where you say there's some region around zero that for practical purposes is close enough to zero that I'll consider it no effect. And you can specify that here with this h0 argument, and you just give it a vector of two arguments that the lower value around zero and the upper value around zero to say within that I'll consider it actually similar enough to zero. And you can see that you get slightly different results. So the point null or leading hypothesis of a positive effect size is 0.95, whereas with the interval it's 0.99. And I would recommend using an interval hypothesis most of the time, partly because the point null, it's slightly unusual in that you're comparing areas under a curve and then also comparing the height of the curve. And it's a little bit of an apples and oranges thing I find, whereas with the interval null you're always comparing areas under the curve. And that makes a little bit more sense and I think the results are a little bit more intuitive as well. And that if you look at the numeric results here, probably the result of this analysis is closer to a probability of 0.99 than 0.95, because you may recall that the second study was already significant on its own, and the first study was still providing evidence for any fact. So it seems that 0.95 is maybe not quite strong enough for what you would expect intuitively, whereas with the interval null you get a stronger result. It may not happen in every case, but at least with this dataset it sort of conforms to intuition a bit more. And I just wanted to dwell a little bit more on a point null, because it can be a little bit dangerous I would say. So here I'm analyzing the results of study one, and the graph on the left uses the standard prior with the standard error multiplier set to one, and the graph on the right is identical except the standard error multiplier set to two. So we're using a broader, more uncertain prior. And you can see that in the graphs here. The graph on the left with the gray line is the prior, and the graph on the right also has a gray line, which is much much wider and less peaked, representing less uncertainty. And you may recall the way that we calculated the base factor for this was the height of two points on the curve, the effect size zero. So if you look at the graph from the left, you can see that the height of the curve for the blue line, which is the posterior, so after we've conditioned on the data, is lower than the value of the prior. So what this is telling us is that given the data, an effect size of zero is less plausible than it was before we saw the data. If we look at the other one, we see that the points are on top of each other, even though the data is the same. And the data is, you know, shifting that blue curve towards the right because the data suggested positive effect sizes were more likely. It didn't change the result of our hypothesis for null effect. And this is slightly counterintuitive because you're thinking, well, okay, this prior distribution contained less information. So therefore the posterior should reflect the effect of the data more and should be shifted more to the right. But it doesn't always happen that way. And indeed, it's hard to tell on these graphs, but the blue distribution on the right is shifted a little bit more compared to the blue distribution on the left. But it's also slightly wider as well. And you can sort of tell that. So these results tend to be a little bit counterintuitive. And the results that you get are very dependent on the prior that you specify when you have this point null. So for that reason, I tend not to use these and we recommend using an interval. And oh, here's just a reference line so you can compare the results across the two graphs. So as a summary, so this approach can be used to combine a diverse data, but you need to think very carefully about your priors if you're using the point null approach, as that has a big effect on the base factors and then on these subsequent calculations. So I've tended not to use it myself in the end. It didn't in terms of what I initially wanted it for. But if you do have diverse data and you're thinking maybe a meta analysis is not appropriate, this might be something that you could consider. So this package is up on CRAN and it's also on GitHub if you want to download it. And also like to thank Bruno who helped convert some of that R code and help create this package. So I thank you for your attention.