 All right, next, I'm happy to introduce Tim Parker, who will be talking about using the same data, different analysts, variation in effect sizes due to analytical decisions in ecology and evolutionary biology. Okay, thanks very much, David. Let's see, it's the green button for advancing. I just wanna go back, there we go. Okay, before I get started, I want to acknowledge my co-authors. There are many. I especially wanna acknowledge Elliot Gould, who's a grad student who's done a huge amount of the work on this project and is here today. I also wanna acknowledge the two other co-leaders of this project, Hannah Frazier and Shanichi Nakagawa. And then I, of course, wanna mention that this work wouldn't have been possible without more than 200 different analysts and peer reviewers. Okay, so biologists have ecologists, in particular ecologists and evolutionary biologists have known for a long time, that there's a tremendous amount of variability among studies in results. This was actually quantified a few years ago in this paper that was essentially a meta-analysis of meta-analyses. And this paper showed that heterogeneity as measured by I squared, which is a common measure of heterogeneity in meta-analysis, among the heterogeneity within individual meta-analyses in ecology and evolutionary biology was really very, very high. A typical median I squared value of about 85%, which is a lot of differences among studies that are putatively studying the same topic. So biologists, like I said, biologists and ecologists have known for a long time that there's a lot of variability among studies. And frankly, they are not surprised by this fact. If you went to take a look at these four different grasslands from different parts of the world, and you went and did the same experiment in those four different grasslands, I think most ecologists would expect you to get different results because these systems are qualitatively different from each other. They are noticeably different just from looking at this photo and they're biologically different from each other. I think most ecologists have assumed that when they go out and they get different results that their results are attributable to what we might call true biological heterogeneity. The biological world is heterogeneous, so of course our results are heterogeneous. But ecologists also differ tremendously in the methods they use. And so methodological heterogeneity is a plausible explanation for the heterogeneity and results that they observe. And of course what we're gonna talk about today is analytical heterogeneity. Different ecologists analyze their data in different ways. And so probably again, a potential source of heterogeneity in results. So there have been a number since the 2018 Silverzone Out All paper, which sort of introduced many of us to the concept of many analyst projects. There have been several that have been published. What unites all these papers is that a dataset is provided to a large group of analysts that analysts are invited to analyze the dataset to answer a pre-specified question. And then what we see is a lot of variability in the results provided by those analysts, even though they're using the same data and being asked to answer the same question. We were inspired to do something similar in ecology and evolutionary biology. And so we identified two datasets and asked a particular biological question to each of those datasets. So two datasets, two questions. Our first question was this, to what extent is the growth of nestling blue-tits? These are a small European songbird. To what extent is the growth of nestling blue-tits influenced by competition with siblings? So this is kind of maybe in the realm of evolutionary, what might be called evolutionary ecology. 79 teams submitted analyses with a total of 132 usable effect sizes. Our second question was from a discipline we might call restoration ecology. And it's related to data from Australia. So the question was how does grass cover influence eucalyptus species, eucalyptus seedling recruitment? So basically how does grass cover influencing the recovery of woodland communities in former agricultural lands? 68 teams submitted analyses with a total of 80 usable effects. Okay, so what do our results look like? Well, the short answer is, like those studies in social science and neuroscience, our results vary quite a bit. So this is a forest plot. Each dot there on the plot, each of those blue dots, is a different effect size submitted to us by an analyst, pardon me. Let's see, what's the, this, yeah, there we go. So on our X axis there is the standardized effect size that's commonly used in ecological and evolutionary biological meta-analyses on Fisher's Z transformation of the correlation coefficient R. You can pretty much think about this as a correlation coefficient R that's just unbounded. So it's just, anyway, I won't say anything more about it, but zero would be no effect. So essentially a correlation of zero between our target variables. And what you can see is that effect sizes submitted range from quite large, almost approaching a value that would, in a correlation coefficient would be 0.8, which is very strong in ecology and evolutionary biology, down to null effects, two effects in the opposite then predicted direction. And one of the things that was quite surprising to us is that when we use heterogeneity, this I squared measure of heterogeneity, we actually get a really, really high value that's even higher than the averages we saw in the meta-analysis. Okay, here are our data from the eucalyptus dataset. This distribution looks different. Here's the average effect size of this dashed line, it's relatively close to zero. There are effects that appear to be bigger than zero and effects that appear to be smaller than zero. And there are some really striking outliers. I'm not gonna say a lot about those outliers other than that they're pretty much, I am happy to talk to you more about them, but they're pretty much related to folks who chose to analyze subsets of the data rather than the entire dataset. Let's see. I don't remember what I did with that. There it is, yeah. Our I squared value is again quite high. If we remove these substantial outliers, our I squared value remains relatively high, but it does reduce notably. Okay, so there are I think two basic questions that might be inspired by seeing these data. One is just why do we see such substantial heterogeneity? This would be the same question we would ask from other many analyst studies. And the other question we might ask is why does the pattern of heterogeneity differ notably between these two data sets? Okay, so in trying to answer the first one, I'm not gonna give you a comprehensive answer, I'm just gonna jump through a few things here really quickly. One possibility is that there's variation in quality of analyses. So I don't really think that that's happening here. So we had peer reviewers look at these analyses and rate them and they rated them from deeply flawed and unpublishable all the way to publishable as is. And you can see that there's not really any variability and effect size there. So overall that I squared I already told you was 98%. If we cut out any study that received at least one rating, and all these were rated four times, if it received at least one rating of deeply flawed and unpublishable, we removed it and that left us with an I squared of 75%. If we also cut out publishable with major revisions, that doesn't really change anything. And this is just for the tree data set and we look at the bird data set. When we cut these out, we really see no change at all in the I squared. So looks like variation in analytical quality is not a substantial explanation here. There are small number, so another possibility is that there's a small number of variables with large and divergent, large effects and divergent effects. This again doesn't really seem to be the case. There was one, there's this one set of contrasts, this is with the blue tits, one set of contrasts that are really associated with weak or negative effects, but not so much for any of the others which are widespread across an array of effects. So I think probably, this is just a quick summary. I think probably what we're looking at is what some previous authors of these types of studies have argued is that it's accumulation of a lot of different small decisions that are leading to these patterns. Okay, so why does the heterogeneity differ between these two data sets? This is gonna be very short, I don't know. I mean, it's very interesting. We're looking at following this up and trying to figure out why this might be the case, but I don't have an answer for you right now. So what do we wanna do with this? First, I hope that ecologists take note of this and become aware of the fact that their different analytical choices may have big impacts on their outcomes. And I hope that it leads to facilitation of discussion in these disciplines on about whether and how analytical practices should change in response to this sort of information. And I'd be happy to take questions if there's time. Two questions. Regarding your last point about discussion, have you offered these analysts the opportunity to look at what others have done and do people revise? Would they revise their approaches in that case? So we did not, in this study, we did not offer analysts the opportunity to see what other people have done. I mean, certainly at least one previous many analyst study did that and people did revise, but that wasn't part of our study, so yeah. Yeah, just short question. There seems to be one effect and one null effect. And in the case of the effect, there's more heterogeneity than in the null effect. I think that's been observed in social science or at least in psychology as well. Olsen-Collenten, for example, observed that with actual direct replications. Do you have any thoughts on the possible correlation? Yeah, I mean, I think that that is a real possibility, that that could be what we're looking at. Just that there just aren't strong relationships, latent anywhere in the data to drive big effect size. And so the effect sizes are all gonna hang around zero. I think that is really quite plausible. Okay. Thank you. Thanks so much.