 Last but certainly not least, we have Moin Syed, who will be continuing the theme on replication, replication or generalization. How sample diversity complicates the distinction. Great, thank you, David. Hi everyone, Moin Syed, Department of Psychology from the University of Minnesota. As you'll see there on the OSF link, you can access these slides. I also have a full preprint version that if you wanna read over 5,000 words about this topic, you can check that out. Also available at the OSF link that will be on the bottom of all the slides as they go through. So this is a figure that's often shown to try to make sense of the different terminology that we use in meta science that if we take, if we look at whether or not the data and analysis is either the same or the different, or different we can distinguish between reproducibility, replication, robustness and generalizability. However, like all two by two tables, there's a lot of complexity that's hidden within here. And as Beth just was discussing, there are actually multiple forms of replication. So we can talk about direct or conceptual as well as additional ones. And I'm not gonna go over this because Beth just did a really nice job explaining the distinction between these two. But it's unclear what the distinction is between conceptual replications and generalizability as both are meant to expand out the context of the original target study. So this leads to a lot of kind of squabbles about what kind of replication it is that you see in any particular target. And that's because we tend to focus, when we're talking about replication, we focus on the procedures or the results. These are the methodological details that distinguish these different types of replications. As No Second Errington said, this focus is intuitive, easy to apply and incorrect. That this isn't where we should be focusing our attention, but rather we should focus our attention to the actual claim that's being made. So a replication is any result that is diagnostic of a specific claim. This has two advantages. One, it helps get us away from methodological minutiae and determining which type of replication we're talking about. But second, it focuses on the inferences that we draw from evidence, which is more of what we typically should be concerned with than the methodology per se. No Second Errington also did something a little bit different where they talked about replication and generalizability as really intertwined as part of the same system. And I don't have enough time to go into detail on this. I highly recommend that you check out the paper. But the importance is that this is a much more nuanced way of thinking about the relation between the two that we typically see. What we often see in the social sciences, especially psychology, economics, political science, is we see this push and pull between internal and external validity. So if we're thinking more about internal validity, which tends to be our focus in these disciplines, that means we're necessarily sacrificing some external validity. And so it's one or the other. Kevin Estrulling gave a great talk, he's over here at Sirius, gave a great talk yesterday about this and his paper, I highly recommend reading this paper about generalized causal claims, that this is really a false way to think about these different constructs, that we need to have all of them, these forms of validity are part of an integrated tight system. So with those definitional issues in mind, I wanna get to the core point of why I'm here today, which is just to kind of throw out four assertions based on my own observations of the evidence. First, when I'm reading through all these discussions about replications, I find that it really, the assumption is that the research is occurring in a vacuum, that researchers aren't actually motivated parties. And in No Sex Talk this morning, he really outlined how that's not the case, that we have individual personal motivations and we saw this quite clearly with the early days of the replication crisis where people responded in variable ways to whether there was evidence or lack thereof for particular findings. So if we're thinking about replication, generalizability, or any other meta-scientific concepts, we have to think about these underlying motivations that exist. Second, researchers are highly motivated to maintain their claims associated with universalism. Again, this is particularly true in social sciences where we want to make universal claims, often based on meager evidence, but we wanna be able to make claims about all of humanity. Going back to Robert Guthrie's classic book in the 1970s, Even the Rat Was White, there has been a parade of scholarly articles decrying the lack of sample diversity that's used in our research. And yet, nothing has really changed. We just keep going on business as usual. Mainstream researchers are perfectly content to make universal claims based on meager evidence. Hello, there we go. Despite this focus on universal claims, failures of replication are seen as a greater epistemic threat than our failures of generalizability. There's a reason why the replication crisis was such a big deal, and you can even draw a straight line from the replication crisis to this meeting right here. In contrast, the generalizability crisis really has not become a thing, despite the fact folks like Tal Yarkoni and others have tried to make it a thing. It hasn't really caught on. It hasn't had the same emphasis. We can recognize the challenges and the limits of generalizability, but we just see it as limitations. It's a limitation in our work. We acknowledge it and we move on, but don't really change what we're doing, whereas replication requires a different kind of reckoning, a different kind of attention. And finally, because of this, researchers are motivated to reframe threats of replicability as limits of generalizability. Because there's fuzziness in these concepts and how we define them, we can decide what counts after the fact as a replication test or generalizability test. And we've seen this quite clearly with post-talk, degenerative claims such as hidden moderators or contextual sensitivity. These are ways to maintain the original claims in the face of failed replication by reframing them as failures of generalizability, which are less of a big deal. Now, to provide a little case study of what this looks like in real life, I'm gonna briefly go through the GWAS studies of educational attainment. These are genome-wide association studies that examine the entire genome for genetic polymorphisms that are linked to some outcome. And then ultimately, the hit alleles can be added to create a linear polygenic score that can be used to predict some outcome. What's unique here is that there has been four successive studies that have built on each other that have primarily been from the same research team, more or less, and more or less the same methods. So we can see how the claims have been modified over time in light of new evidence. So the first study was with over 100,000 participants and it's very typical social science work. It was all European heritage participants and fully unconstrained claims. So the title, the abstract, the discussion, everything was unconstrained. So it was a limited sample, but it was broadly generalized to all humanity, very familiar, we see it all the time. The second version, the second iteration had a larger sample of over 300,000 but otherwise looked pretty much exactly the same. Again, fully European ancestry samples, again, fully unconstrained claims, no difference whatsoever. The turning point was with the third study, the Liadol study, which had over a million participants. Still, the discovery sample was European ancestry, but they did an additional analysis in the African ancestry sample and saw an 85% reduction in the already small effect size. They did acknowledge this in the limitations that perhaps these results are only applicable to European heritage folks, but all the claims, the title, the abstract, the implications were all fully unconstrained. So despite the fact that there was evidence presented in the paper that this was not a universal finding, they nevertheless held to universal claims. Now what's interesting here, this is the third in a series of four papers. So the question is now that there's this evidence and I should say this is alongside a whole bunch of other evidence in genetics and sociogenomics about the importance of diversity and thinking about who our samples actually are. We can see did they actually change the way they made their claims in the fourth iteration of this study? Spoiler alert, they did not. Big surprise, right? So once again, now they have over three million participants, once again it was a European heritage, European ancestry sample for discovery, once again a massive attenuation when they looked at the predictability in an African ancestry sample, once again fully unconstrained claims all throughout. What was shocking in this version though is there also was no discussion of the limitations either. So it's actually a step backwards from what we saw in the previous version. Now what we see here is widely known and acknowledged within genetics researchers. Everybody knows this is the case. This is seen as a lack of generalizability. It's a known problem of generalizability and perhaps is one reason why it was discussed in the final version. However, given that their universal claims is for evaluating the claim, the fact that we don't see the same predictive power in a different ancestry group suggests that's a lack of replication to me because the claim is a universal one and that's what should be evaluated. This is one case study of looking at GWAS, we see this but all over the place in social science research where in the face of evidence of lack of generalizability the universal claims just keep moving and keep going forward. And so that's generally this dynamic that I was outlying here that there's a strong motivation to maintain these universal claims because generalizability and that we just don't see generalizability as particularly important. Okay, so what do we do about that? Two solutions and then I'll finish up here. The first is quite simple. Make calibrated claims. Just alter our claims a bit instead of saying genetic associations and 1.1 million people, you can say 1.1 European heritage people. You just add one little clause that we know is important to indicate who the sample is. Importantly, this is different from just putting a statement in the limitation section. Beth Clark who just gave the talk has a nice study looking at the function, the content and function of limitation sections and when there's something as identified as limitation it suggests that it's reasonably important that we should pay attention to it but not so important that we actually need to change our procedures. We can just identify as limitation, move on. It's not a death blow to the study, otherwise the study wouldn't have been published. So this is relatively easy to do. It doesn't cost anything except for us to divest from a strong belief in universalism. That's really the only change, no problem there, right? The second one is much more difficult. This is building heterogeneity into our theories and methods. This is more consistent with what Kevin Escherling was talking about yesterday where you really have to think about the hows and whys of where there's variability and you actually build that into the system. Importantly, just having diverse samples isn't enough here because if you have diverse samples you still have to make sense of that diversity. You have to understand why you're seeing that variation and that's where building it into our theories and methods is really critical. Ironically, taking this approach seriously could probably get us closer to being able to make truly universal claims. But this is hard work. I just want to end by thanking my colleague behavior genesis Matt McGew because we were emailing about these issues back in January. This call for papers for this conference came out and I said, oh, what the heck? I might as well submit something on this. So once in a while, email can actually be a force of good, not just annoyance. Thanks for listening. There are three seconds for questions if anyone wants to. That's okay, you can talk to me after if you want. One question? Okay, one question. Yeah, I have a question about, I think your first recommendation about calibrated claims, which is in your example about rewriting the title as about Europeans, presupposes that there's a causal relationship there. So I feel like there's an implied assumption there that we have a common causal theory about what are the important factors that are mediating what we find in the results. So I'm wondering if you can touch a little bit about, can we come to a consensus in a discipline about such things? Absolutely, I mean, so that's a huge issue. We don't always know what to calibrate on, right? But sometimes we do know quite clearly and I think that my example is one where we do know there's just mountains of genetics research that shows that ancestry is important for understanding GWAS and polygenic scores. And so right there we know that that's clearly one that we could put into the titles and the abstracts and our theorizing. But we don't always know, but I think the key is that as we accumulate evidence of where we should calibrate, we're not doing that in terms of the claims and so we should. So I guess that's the point, yeah. I don't know if that helps, yeah. Thanks everybody. Thank you very much to the speakers. Thank you to the audience for your questions. I'm told that we will be starting at 4.30 for the next one, so a quick break.