 Hi, my name is Matt Lloyd-Jones and today I'll be talking to you about some ways of dealing with proportion data and meta-analysis. So what is proportion data? Well, proportion data is simply data that is presented as parts of a whole. So it could be proportions, percentages or prevalences, they're limited to 0 to 1 or 0 to 100%. And some of the research questions you might have relating to proportion data are demonstrated here. So my one was how does the proportion of resistant bacteria in the fecal microbiome differ between antibiotic-treated and untreated cattle? How does proportion data typically manifest in the scientific literature? Well, typically we don't have the raw data and we're limited by what studies present. Typically studies only present proportion data in their figures. Some studies won't even present that proportion data directly and they'll present figures of the numerator and denominator data from which we have to calculate proportions. And then alongside that a few studies after a long hard battle will provide raw data. And so we're left with these different levels of detail in terms of data that we have to combine into a meta-analysis. Okay, so with all that in mind, I'm going to be taking you through three problems that I came across when dealing with proportion data in my meta-analysis and the proposed solutions that I implemented in the hope that they can help you. I'm going to be highlighting the r packages and functions I used in blue on my slide. So keep a pen handy for writing those down if you're trying to do something similar. So perhaps the most obvious problem with proportion data is that they are restricted to a 0 to 1 or 0 to 100% range, which means they don't meet the assumptions of normality. So given that proportion data do not immediately meet the assumptions of normality, if we're going to use parametric tests for meta-analysis, then we need to do something to the data in order to deal with this. And one common way of dealing with this is to transform the data to get it into a format which more closely approximates a normal distribution. And there are many options for transformation, including log, logit, arcsign, and double arcsign transformations. And there's a lot of chatter about the relative merits of these different types. And it's often confusing to the meta-analysis which type is the best to use in their case. So which one did I use? Well, I used the arcsign transformation and I implemented this transformation using the transfer arcsign function of the metaphor package. So the arcsign transformation has benefits over the log or logit transformation in terms of dealing with sampling variances better, and also over the double arcsign transformation in terms of back transformation and interpretability. To be fair, the arcsign transformation has its own criticisms, so one of the criticisms is around its interpretability. So if you use this in a conventional modeling context, you can get predictions over or outside of the 0-1 range. But I argue that actually when you're using this in a meta-analysis context, what you actually do is you generate effect sizes first, which you then put into your models. And so it implicitly deals with this problem. And actually if you go back to some of Cohen's early work, he provides even guidance on how to transfer all the effect sizes back into approximate differences in proportions between your control and treatment groups. So I argue that that's less of an issue in meta-analysis. The other argument is that GLMMs, accounting for binomial distribution, are better at dealing with proportion data. But this in a meta-analytical context is perhaps less relevant because as I demonstrated earlier, you don't have the underlying numerator and denominator data. You just have proportions presented. So you can't even use these methods. And there are other issues with these methods in terms of dealing and model fitting with sparse data, which we often have problems with in meta-analysis anyway. So I think the arc's transformation is a good pragmatic choice in terms of dealing with proportion data. So problem two is even before you've got to the question of how to deal with your proportion data in modeling, you may not have your proportions directly presented in the paper that you're trying to incorporate into your meta-analysis. So while some studies will only report proportions, you'll come across other studies that don't report proportions at all. And they actually just report the numerator and denominator variables that would make up that proportion. And in these cases, almost always you've only got some measure of the mean and variance of those proportions rather than individual data points. And this makes calculating that a proportional mean and variance quite challenging. So how do you calculate the mean and variance of the proportion based on this data alone? Well, the only way of dealing with this problem that I can see is to simulate the underlying data based on those means and variances of the numerator and denominator that you do have, and then to divide them by each other in order to estimate the mean and variance of the proportion. And this is fairly easy to implement using base functions in R. You can plug in your means and standard deviations of your numerator and denominator variables and simulate vectors of simulated data. And then what you can do with those is you divide one by the other to calculate a simulated vector of proportions. And then from that, you can calculate a simulated mean and variance of a proportion. Of course, this is not ideal, but then most things in match analysis are not. And you're kind of bound to getting everything into this proportion format by the fact that many studies only report things in proportions, as I mentioned earlier. With this kind of thing, I'd recommend a sensitivity analysis based on removing the simulated data just to check that it's not having a disproportionate influence on your overall results. And the final problem that I want to highlight is the lack of interpretability of proportional differences in metro analysis. So by their very nature, proportions obscure what is happening in the underpinning numerator and denominator data. So the proportion of your numerator can change either because it's absolute value changes or because the proportion of the corresponding denominator changes. So, for example, the proportion of resistant bacteria in a population may increase because the resistant bacteria grow and increase in number. Or alternatively, the proportion may increase because the absolute value of the denominator decreases. And these obviously have very different implications in terms of interpreting what's going on in our data. And I actually think this is a bigger interpretability problem than some of those I mentioned earlier. And the way I propose to deal with it is to use absolute count data where it is available alongside your proportion data on which your main analysis is built. So you have your effect sizes based on your proportion data, which cover all of your studies and you've plotted and modeled those. But I think you should also alongside those plot and model the absolute counts where they are available. And this can give you a sense of whether what you're seeing is an actual increase in your numerator or associated with a change in the denominator. So for example, if we plot effect sizes based on the absolute number of resistant bacteria alongside those based on the proportions, we can see if the resistant bacteria are growing and increasing in number, or it's just the fact that your susceptible bacteria are being wiped out. And this can really aid the interpretation of your meta analysis, which can be very abstract as we know. So in summary, we often have to deal with outcomes reported as proportions, whether we like it or not. I proposed some potential pragmatic solutions for dealing with this data in the form in which we often encounter it when we're doing meta analysis. But of course, the real problem here is that people don't publish their data and are now allowing us to fully make use of it. And the real solution of course is for journals, universities and funders to enforce and or reward data sharing. So we can do more rich analysis. So finally, I'd like to thank the whole evidence synthesis team that was involved, not just in the meta analysis, but the wider systematic review project. In terms of the meta analysis, I'd like to thank Alfredo Sanchez Toha, who was my main meta analyst, who'll collaborate. I'd also like to thank Shinichi Nakagawa, who provided some useful email correspondence on the transformations part of this. And Dan Padfield, my colleague, who came across similar problems outside of meta analysis and sort of influenced my thinking on this a bit. And finally, here's a brief bibliography slide in case you want to follow up on any of the resources I presented. Thanks for listening.