 Hello and welcome to this symposium session on how biology differs from other sciences and why this matters for meta science. My name is Pamela Reinegel. I'm a professor of biology at the University of California at San Diego and my co-organizer is David Peterson, who is a postdoc fellow in sociology at UCLA. Just to briefly frame what we're going to be doing today, the modern meta science movement rose out of the social sciences, psychology, and political science as a sort of a self-examining process and then has broadened its scope to other sciences, especially biology and biomedical science. And while the intense interest in biomedical science is very understandable, some of us have concerns that there might be an unspoken assumption in meta science that biology research is somehow similar to or comparable to the research that they have been addressing in the social sciences. And this could create a risk that the kinds of assessments or interventions that are undertaken might really be inappropriate for the discipline because biology isn't that really different. Not just in the object of what biologists study but also the kind of methods we use, the types of inferences that we're drawing, the goals of the research is trying to achieve, the constraints that are placed on our research and our theories and the whole culture of the research enterprise. And even biology is too broad of an umbrella. There's many sub-disciplines within biology that are really quite distinct in these regards. And so the premise of this session is that the differences between the sciences really matter. They're substantial. And therefore, meta science needs to be grounded in domain-specific understanding from a lot of knowledge of the specific discipline it's addressing and not try and have a one-size-fits-all model of meta science. So to give you an idea of the scope of what we're going to talk about today, we're going to explore this question for biology in particular, but from a really broad disciplinary spectrum. The format is that each of our four speakers will give a 15-minute talk and just five minutes for question and answers. And then after all four of them have spoken, we're going to have an extended period for discussion and question and answers for all of the speakers. And we'd like to introduce briefly who the four speakers are before the talks begin so that the audience has an idea of all the different angles that we're going to be taking on this question. So the first speaker in the session is going to be Dr. Steve Goodman from Stanford School of Medicine. And I think it's relevant and interesting to see that his interdisciplinary training began at the very beginning of his career at Harvard as an undergrad. He did dual major in biochemistry and applied math. He then earned his medical degree at New York University and completed his residency and board certification, but apparently is a glutton for punishment because then he went from there to John Hoppens School of Public Health to get a master's in biostatistics and a PhD in epidemiology. So that's the threads of training that he's bringing to this conversation. And just to fast forward to where he is now, Dr. Goodman is the associate dean of clinical and translational research and a professor of epidemiology and population health and in medicine at Stanford School of Medicine. He's the co-founder and co-director of the Meta Research Innovation Center at Stanford or Metrics and also the founder and director of the Stanford program on research rigor and reproducibility or SPOR. And his own research concerns the proper measurement, conceptualization and synthesis of research evidence with a special emphasis on Bayesian approaches. The second speaker after Steve will be Dr. John Dupre from the University of Exeter. Dr. Dupre did his PhD in philosophy at Cambridge University and then was a junior research fellow at St. John's College in Oxford. And then moved on to a faculty position in the Department of Philosophy at Stanford University where he was on the faculty for about 14 years. And this fact, his long stay at Stanford will sort of permanently label him because he is known as a member of a small group of philosophers that are called the Stanford School because of the ideas that that were being developed in that department at that time. Fast forwarding to today, he is a professor of philosophy of science in the Department of Sociology, Philosophy and Anthropology at Exeter University. He's also the director of EGENIS, the Center for Study of Life Sciences. He is or very soon will be the president of the Philosophy of Science Association. He's a fellow of the Royal Society for Arts and an honorary international member of the American Academy of Arts and Sciences and his own research in the philosophy of science has focused in the philosophy of biology, where he's written articles or books on pretty much every classic problem in the philosophy of biology. But of note recently, he's been making arguments about looking at biology, understanding biology from a process centered point of view. I'm going to hand the podium over to my co organizer, David Peterson, to introduce the other two speakers before we begin the talks. Our third speaker will be Nicole Nelson. Nicole is an associate professor in the Department of Medical History and Bioethics at the University of Wisconsin, where she also has affiliations with the History Department and the Holt Center for Science and Technology Studies. Her book, which was published by the University of Chicago Press is Model Behavior, Animal Experiments, Complexity and the Genetics of Psychiatric Disorder. It's a very interesting look at the role of animal behavior models in psychiatry research. From 2018 to 2019, she was a fellow at the Radcliffe Institute for Advanced Study at Harvard. And during her period there, she began her current project, which is looking at the reproducibility crisis in biomedical research, which is what she is going to be presenting. I believe she's going to be presenting some of this research here today. So we look forward to that. Our final speaker is Bob Weinberg. Bob is the Daniel K. Ludwig Professor for Cancer Research at MIT. Among his numerous accomplishments, he discovered the first human oncogene and the first tumor suppressor gene. Bob also with Douglas Hanahan published a hugely influential article, The Hallmarks of Cancer, which identified the six biological capabilities of cancer-causing cells. He's received nearly every prize a biologist can receive, including the Wolf Prize in Medicine, the Japan Prize, the National Medal of Science. He's also an entertaining and insightful writer on issues of science policy. I certainly recommend anything he writes in this topic. And so we're looking forward to what you have to say about this intersection of biology and science. But first off, we will begin with Steve Goodman. Thanks, David. Before we start, I just want to remind this, the audience that if you have questions for the speakers, please put them in the question and answer box. If you have other comments and discussion points, you're welcome to use the chat for that. And so without further ado, it's my pleasure to introduce Dr. Steven Goodman. Well, thanks so much. Can you hear me okay? That sound calibrated? Okay. So I'm going to share my screen, do the do the whole slide thing, and we'll just get started. Let's see if this works right. And I'll depend on you to tell me if logistics are fine. I'm going to share my screen. Should appear. There we go. Okay, everything okay? Okay. So I'm going to talk about research reproducibility in biology, which is a bit of hubris, I will say, because I myself am not a laboratory scientist, I've worked closely with them, actually in a cancer center for many decades at Johns Hopkins. But I have not spent a huge amount of time looking at issues of research reproducibility in biology because it really hasn't been touched by the the meta research or meta science movement very much until recently. I've been in the business of what is now called meta science, what is now called reproducibility for about 30 years. And I want to give you a picture of that trajectory. But before we start, I'll do a little just thought experiment here and show you a quote says in the 2015 gathering, the most renowned US statisticians concerned about the misuse of statistics and science, the following statement was widely agreed with. Scientific inference from a set of data is not the formal exercise one finds, one finds taught in statistical classrooms. We've learned one has own has only to determine whether to reject at the 5% level or the 1% level, then the statistician can grandly draw obvious conclusions about data from any scientific field by proclaiming significance or non significance. Such nonsense is taught usually by professors who have minimal contact with the application of statistical methods to scientific problems. I want you to think about that. I if we were if I was in front of you, I would ask in the audience where you think this conference was and who these statisticians were won't be able to engage in that right now. But oh, wait a minute, I really I made a mistake on this slide. This was 1965. I'm sorry, it wasn't 2015. So I just want to show you that's a lot of the conversation that we're having today that we think are new are very, very, very old. And this was the paper on which it occurred. It was indeed the premier statisticians at the time. And they were dumping all over traditional statistics just like we've been doing in articles for the past 50 years. Now, I want to draw a few points from that, which isn't just the the sort of semi humorous one that we might not be aware of. But they made the point that even standard statistics, which is about about the most discipline agnostic of methodologies, cannot be sensibly applied without field specific knowledge. I want you to think of that. Also, the meta message, there was method logic change or form, particularly in clinical research, I'm going to go through that in a minute has been going on a very long time. Pamela mentioned the recent invasion or or application of of or translation of meta science to biology. But this is not a recent phenomenon. And we have a lot to learn if we look at what the movement to reform clinical research has achieved. And we should apply those lessons to science reform in biology. So here is a timeline of just some key moments, I would say, in the reform of clinical research, research on human beings. And I'm not going to go through all of these, I'm just going to sort of note the dates. Here we have through the 60s and 80s, the ascendancy of randomized controlled trials in the 80s through the 2000s, the establishment of systematic reviews as the foundation for knowledge in 1988 was the first peer review Congress, which will be held next year would have been this year, if which which looks at standards in a publication and evidence and really meta science in in in published biomedical research. 1992, that the term evidence based medicine was coined the next year, the Cochran collaboration three years later, the consort guidelines, which has now been followed by dozens and hundreds of reporting guidelines that are all on the equator website, of which the most recent MDAR guidelines and the arrive guidelines which appear, which apply to more to biologic research derived directly hundreds and hundreds of articles critiquing standard paradigms of statistics. In 1997, the FDA Modernization Act required trial registration. 2000 clinical trials dot gov form of pre registration. They added results reporting to that in 2008. This is the the prospective registries pre registration and results reporting were added as ethical requirements to the Helsinki Helsinki declaration. 2008 2010, the patient centered outcomes Research Institute was established with legal requirements for method standards for all published research. 2014 results reporting 2015, PCORI released the first data sharing requirements from a major funder. FDA started mentioned Fada, which is the most recent legislation mentioned Bayesian trial methodologies as as requiring that that that's something they wanted the FDA to look at the ASAP value statement 2016. And in 2020, just last year, NIH added rigor and reproducibility training to its grant requirements. So I just want to give you a sense of how long this has been going on. And so it's still ongoing. And it's been accomplished by medical scientists, professional societies, journal editors, regulators, funders and politicians. It was accomplished by people reforming the fields in which they were working, and which they understood I want to really underscore that. And they've been focused almost entirely on clinical research. And there was a huge amount of preceding meta research and debate before the policy changes were instituted. But once they came, they came like a flood. Now, before I get into, well, to start to transition over to biology, I want to show just one paper that I started to use in my teaching. And I want to point out that the way I've started to get into the the issue of reproducibility and biology is through the creation of this in at Stanford, this this entity called Spore, which was sort of a personal mission of mine, because I've been involved in this for so long doing research, research, research, research, telling people what to do, telling people what to do. But if they say, how are things, you know, different at Stanford, we had to say, well, really, not much. And they were not that much different in many places in many areas of science. And we have to take on so I decided we should commit ourselves to train changing things on the ground, which is a very, very different thing than finding methodologic things to improve. And this included laboratory research. And I will say that I come in with a tremendous amount of humility. And anyway, let's go through this paper, which is brilliantly presented. And it was a multi center study. And I don't know how many people are familiar with this. It's fun, because it's basically the whole thing is done in pictures, simply to look at the, the seemingly simple technology of looking at how you establish growth rates, in response to various agents of different cancer cell lines. So what they did is they had a and this is all done in pictures, they had a single center develop and test the protocol by which they would have the cells, the media, the standard operating procedure and the and the drugs that they would test to look at the growth rate, the effect of these various drugs on the growth rate of these different cell lines. So the first center established all the protocols, they shared the drugs, the cells, the media, the standard operating procedures with five other centers. And they then saw whether they could establish the same growth response curves. And so they did a lot of technical replicates and biological replicates in in center one. And then they put them all together in center one. Then they did this is looking at different scientists in center one. And then they sent them out to all the other centers and you see very different curves, very different curves. Okay. And then well, not only that, but they found that they had different curves depending on what drug was administered. And in particular, and I'm really jumping through many, you know, huge amount of steps here. But the graphics are so nice. There was an interest interaction between the drug and the counting method, the counting method, which they didn't think was worth attention. And here's a plate map, you see edge effects. And these were the initial results and initial experiments, we observed center to center variation in GR 50 measurements of up to 200 fold. And this was all with established procedures in media. It says irreproducibility arose from a subtle interplay between experimental methods and poorly characterized sources of biologic variation and different in data analysis, particularly the image processing algorithms. And they described all the technical sources of variability, and the steps they took to eliminate it. Two different ways of counting cells, the edge effects and non uniform cell growth, which they fixed with automated randomized dispensing, different concentration, fold range of the assays. So they had to do pilot studies to optimize this different image processing algorithms. And then they display this in this absolutely beautiful way where these different colors can note different things, uniform across centers, non uniform, significant, non significant. And they do this by experimental design materials and supplies method data analysis, I'm not going to go through all the results, you get, you're getting a sense of it, I want to jump to their conclusions. We said, we ascribe the remaining irreproducibility, there was just a little bit at the end, in various technical things. But a belief, belied by the final analysis, that counting cells is such a simple procedure, that different assays can be substituted for us, substituted for each other without consequence. However, we discovered that the most that most examples of irreproducibility are themselves irreproducible, and that the tactical factors responsible are difficult to pin down. And these variables differed from what we expected a priori. And it says with all the above controlled and all preceding elements controlled, they were finally able to repeat the experiment two years later, and got consistent results. So it's only after all that work for the simplest, for the simplest of measurements, obviously, it wasn't a simple measurement. And maybe the biologists will say it was they were naive to think that this is so simple. Now I want to actually look at a typical, maybe this isn't typical. And I expect Bob and and others to tell me a typical this is, this is just the first article in science from a couple years ago. And I want to make sure that everybody watching understands what a biology paper looks like. And you can see the title here. And there are a lot of experiments in this in this paper. This is the first one, to determine whether local inflammation in the lung could directly, let's see, I'm not reading my whole screen. Sorry. Could directly drive awakening of disseminated dormant cancer cells. We study two models of dormancy. And then they described very tersely, we injected luciferase and m cherry expressing breast cancer cells, blah, blah, blah. And then in the next sentence, they say what the results are, tumors did not form. Instead, single non proliferative cancer cells were found in the lungs. And they refer to figures. And this is I highlighted here, this is literally what they said they did. This is the result. This is what they said they did. This is the result. This might look very shocking to people who do other kinds of science, just to have a one sentence description. Now that it isn't literally one sentence, because some of the details we might want are in other places in the paper. This is experiment two, and I'm not going to read there's a lot of words here. This is this is where you start to see some of the details. n equals three, they're showing these curves, but these are based on three animals. Okay, this is only you only see it in the legend. By the way, they show the standard deviations. But when you have ends of three, that is not the proper representation of uncertainty. And there's a lot of detail of methods, but of the actual lab procedural methods in the back of the paper, nothing with numbers, no numbers at all. This is experiment three, experiment four. And finally, I decided I'm not going to count them all up. You can count the experiments by just looking at the panels and the figures, because every experiment has its results presented. And you can see from these results, some of them look like slam dunks up here. But others are not slam dunks. And it's hard to know. It's hard to know. But here's one set of results. Here's another set of results. Huge number of technical details are in the legend. And each one of these represents a different results of a different experiment. And here's another set of results. Again, some of these look like slam dunks. Others harder to say. More. In all, there were 27 experiments in this paper. The key numbers and design details were sparse, almost all of them were in the legends, except the biologic part, not in the text. The numbers were exceedingly small, from three to five. But some effects were extremely big. Each experiment each experiment derived from an extended observations and claims made in preceding experiments. There were 10 different measurement and assay technologies reported. So when I showed you that previous paper, that was one technology. This had 10. The conclusions were based on how all these experiments fit together. What William Hewitt in the 1800s called a conciliance of inductions. If one experiment failed, presumably, the following experiments would have failed too. Now, what you have here is a combination of factors, which we wrote about four years ago, we talked about how scientific, how reproducibility might play differently in different scientific domains. And they're different degrees of determinism, signal to error, noise to signal ratio, complexity of designs, et cetera, et cetera, heterogeneity of experimental results, culture of replication, statistical criteria for truth claims. And I want to say that in some ways, the things we saw in this in this study, which I'm not saying that anything was wrong here, I just want to show you how these papers are constructed, which for anyone who is in a clinical trial, or an epidemiologic study, or a psychology study is dramatically different. Most of these papers report one study design, even animal studies will just report a few. I just want to say that we did a survey of our own, but lab science students, and their lowest self rated efficacy was in identifying biases in study design, almost zero said they could do that, understanding the reasons for randomization and formulating a data management plan, because this was not on their radar as critical to what they did. Mo showed, I found this particularly interesting, more comfort with machine learning than, than understanding p values or tests, and the low confidence areas that mirrored almost exactly weaknesses that are now being documented in the published literature. So my concluding thoughts. Most fields focus on a single study or study design as the basis for claim. And that includes animal research in biology. So that's a different, different beast than what I just showed you, even though some of them involved animals. And they are usually that single study or design is the unit of analysis in metal research. Lab science is not like that. I think we have to be very careful about bringing tools and lessons from one domain, and plying them to another without a deep understanding of the knowledge architecture of that field. Even though I could pick apart each one, maybe of those many experiments they did, it could be that the way the entire constellation of findings fit together, made for a very robust conclusion. I'm not an expert in this field, and I can't comment on it specifically. But that's they put their the lines of evidence together in a very, very different way than most other sciences. We live in separate, quite different epistemic communities with different norms and values, even though we all use the word science. I think biology can benefit benefit from that research hugely, as that first paper showed. But I think we need to acknowledge what is what it has succeeded at, the extremely humble and work very closely with lab scientists to reform with a scalpel and not a chainsaw. We don't want to destroy the science that has given us, you know, vaccines in our arms and all the things that we're living with right now. And finally, I think we need to train a new generation of meta researchers in biology to help them refine the math methods for their own field. I find students get very excited when I tell them they can get papers out of this. This is not just something they do in the back office, they can actually get publications. So this is the what we're just starting to do at Stanford. You can check it out. And in a year, I'll know a lot more about how what the problems are, and what's needed to fix them. And but I also want to honor the successes, as well as the ways that it might be, we might have to improve it. So I'll stop there. Thank you very much. Thank you, Steve. That was fantastic. We went into the question period a little bit, we only have a little bit less than a minute. So I'm just going to ask a quick one. You, you've done some work previously about how how people can learn to measure this other kind of epistemic weight, right? We know how to calculate a p value. But if why we believe the result of a paper has to do with this conciliancy, you know, is, is there a way to measure that? Is there a way to approach that statistically? Oh, I have done some work on that. I think it's very challenging, very, very, very challenging. This I think it may be in the realm of John DePrey's work. So I'll let him speak to it. I think understanding what the weight of what the what we might call the evidential weight of knowledge from this combination of threads of evidence is I think it's very challenging. I'm just going to leave it at that. I don't think there's a simple formula. I don't know if there ever could be a formula. It definitely goes more in the realm of philosophy than statistics. We can try to formalize a concept, but the concept has to be clear. And the conceptualization of how all these pieces fit together is not so clear. And it is a lot what philosophers think about. So I think it's fantastic. You have John on the panel today. Yes, that's a great segue to our next talk and just reminding people who asked questions that we didn't get to in the Q&A panel that we some speakers may be able to answer simple questions in the Q&A panel and we'll get back to questions at the end of all of the talks. But first, Steve, you should unshare your screen so that oh, sorry, I can share your screen. Yes. And then I will be pleased to turn the podium over to John for his remarks. You're muted, John. Apologies. OK, I should probably begin by saying that until Pamela asked me whether I'd be interested in doing this session, I barely had heard of meta science. I'm ashamed to say perhaps. But at any rate, I certainly didn't know much about it, what it was. So this is something of an exploration for me. So the first thing I did, obviously, is try and look it up. OK. And so I looked up and found MetaScience.com, which started out the very first thing it said is this, Meta Science, the field of research on the scientific progress on process. This includes the history and philosophy of science. What is science? Now, as a philosopher of science for many decades, I was obviously quite surprised by this. I mean, I guess even more embarrassed not to know what this was, because it's sort of a bit like finding that one's company has had a hostile takeover, but one didn't know yet that it happened and wondering what's going to happen next. So and I guess, you know, just to put a little flesh on that, I mean, I did think Meta Science was the name of a journal in my field that reviews books in the philosophy in history of science. But I also thought that history and philosophy of science, certainly philosophy of science, had been around possibly since Aristotle, but certainly since Francis Bacon, so at least half a millennium. And when I did this research and found that Meta Science had been around for a couple of decades, maybe often citing famous paper by John Ioannides as its founding document in 2005. This all seemed quite strange. So anyway, so here's my first reaction. I should perhaps say that this talk is not a certainly I wouldn't claim to have a kind of articulated assessment or even less still less argument about Meta Science. But my first thought, I suppose, was that, you know, we had been doing something like this for rather a long time. And at least it would be nice to know what had sparked the need to do it, start all over again. OK, and I should now then say that there I do have a kind of axe to grind. And that the first book I wrote almost 30 years ago was specifically on the defending the thesis that there is essentially no such thing as science. That's to say, there are lots of sciences. There are lots of different ways of empirically investigating the world. And some of them are much better than others, but they're all really quite different. And it tends to be unhelpful, even an illusion to suppose that what you can find out about one science is going to be very useful for exploring another science. Let alone all science. Now, I should say, of course, that I don't mean to suggest that there is nothing to do here, no problem here. I'm sure there's lots of failure of science. There's lots of bad practice and some even, I guess, elements of bad practice that go, you know, across the whole piece in terms of questions of research, ethics, open, open science, funding mechanisms. Um, you say the methods of review and so on. But I think these are these are, you know, very important, but not fundamental to what science is and how it's done. So I guess I wanted to look at them from the point of view of this panel, the the more specific question, whether there is some general problem with biology and, of course, I guess the problem that constantly recurs as motivating all of this concern is the question of replication, which I shall come to shortly. OK, so to pick up the, I guess, a major theme, perhaps one of the wheels that we might round wheels, we might have in philosophy, when one starts thinking about sciences more specifically, a good place to start is by thinking not in terms of some theory of scientific theories, but to look at the specific practices of science. And, you know, looking as Steve just did it, that, you know, how a scientific paper works in a particular field is an example of just that. I think the almost to me that the absolute classic illustration of this is probably the the philosopher that most scientists have heard of is Karl Popper, and no doubt was a brilliant philosopher. But I think that that in a certain sense, he's had very mixed benefits to my field and to science because he has offered, you know, the world this general view of the falsification, his view that science is a matter of finding principles and theories of hypotheses, general statements and trying to falsify them, that has its uses. But it turned out in when about 50 years ago, and particularly I mentioned David Howell, who was the most prominent person in this, philosophers started turning away from just focusing on physics and talking about biology. They found increasingly that this was a very unhelpful way of looking at biology and in particular that actually the sort of generalizations, the laws of nature that Popper had in mind, you know, in some ways critically, but assumed science was about really weren't very helpful for thinking about biology, that actually biologists tended to be much more interested in concepts, in models and in mechanisms. And in fact, there are a number of other aspects of dimensions of scientific work, might mention, you know, instruments, which would include, you know, tools like statistical methods, particular skills, the ability to get experiments to work in a lab, which is something that Polanyi talked a lot about. The collection of data is is, of course, one aspect of this. And in biology, we see that in genomics and still in some areas of natural history. And of course, there's a critical aspect, which I guess meta meta meta science exemplifies. But when we want to try and understand a field in detail, we might look at all of these and the importance of any or all of them tends to be different and the role they play in different sciences varies from case to case. And this, I think, is is where I was kind of rather disturbed by remarks like this, which which as, you know, I think was was trying to make a suggestion of how we should think about science generally. And this was to me a very strange general statement of what science is. And in particular, and I actually have kind of quite interesting grounds for reflection, I was very puzzled by the idea of knowledge accumulation as as a picture of what science is. Now, of course, in some ways, you know, we know more than we did when we did, you know, less science. So there's some sense in which knowledge accumulation is, you know, as a fair characterization. But I think reading around this, there is actually a very specific picture of knowledge accumulation, which relates to a theme in a lot of modern thinking about science, the collection of data. And it sometimes seems as if there's really just, you know, two central ideas in science, get more data, and you try and find out how different bits of data are correlated to one another, what the generalizations that these data instantiate. Now, rather ironically, the the person who I guess is most strongly associated that with that view is Francis Bacon 500 years ago, who thought that scientists collected data and tried to find the correlations among bits of them. Now, there are bits of science that I think that that still characterizes very well. But I think it's a very poor way of thinking about what happens in a lot of biology, in particular, in molecular biology, which I will very briefly talk about in what follows. So, you know, and I think an awful lot of the work that I see in my admittedly still elementary survey of what's happening in meta science seems to be actually thinking of something like this Baconian picture, and looking at bits of science, which have this almost Baconian character of data collection, and kind of sifting data for for correlations. But now so so look instead here as where I think is is is a better thought than knowledge accumulation is knowledge growth, which I'm just trying to give a very brief idea of what I mean. And let me start by looking at one of the most famous experiments in the history of molecular biology by Alfred Hershey and Martha Chase, in 52, I think, anyhow, the 1950s, in which, and this this is experiment basically credited with establishing in the scientific community that the the hereditary material in all organisms, obviously, didn't immediately assume that. But in living systems was DNA. And prior to that it had been most commonly the orthodoxy was it was probably proteins that carry hereditary information. This was the experiment that convinced the scientific community that it was actually nucleic acids. And basically that the experiment worked very, very simply by labelling a protein and labelling the the the nucleic acid in different ways with in viruses, phages that infect bacteria and showing that in the that the reproduced bacteria after infection carried nucleic acids, but carried much less protein. That's a very, very crude statement, but more or less what it was about. Now, I compare this with with, you know, again, this statement from this classic paper as to what's wrong with so much science. And of course, this is a perfectly correct statement of a certain kind of statistical problem. And you assess the Subasian assessment of the evidence for a general statement. But as far as I can see, there's almost no way of relating this to an experiment of this kind. Now, the what's the finding here? Well, actually the finding is as stated in the in the paper that it's very unlikely that proteins carry hereditary information. Now, at this time, there basically only two options on the table that were taken seriously. So if that's right, this is very strong evidence that new DNA, in fact, was the hereditary material, at least nucleic acids. What's the statistical power of this study? I have no idea what would look for that and what level there's no level of statistical significance mentioned anywhere. Now, I mean, this this is, I guess, as it seems to me as is a paradigm of of the kind of work that happens in molecular biology, and I take it to be such simply because it is generally recognized as being a classic experiment. One thing I think is important to emphasize in looking at this is that this is not something that could possibly happen in any sort of isolation. If you look at the paper, there's all kinds of assumptions that are made. I mean, the the unskills that are taken for granted. So Hershey and Chase knew how to label a phage. They knew what a phage was and that a phage would infect a bacterial cell. They knew they had techniques for tracing these these these marked molecules or these marked atoms, their techniques for separating the virus from the bacteria at the end of the process, and so on. And the crucial point I think to make is is is that they are at the same time, if you actually put the context of this experiment, they're not only using this knowledge, but they're actually in some sense, even replicating that knowledge. They're testing the the the power of this kind of technique by doing something else with it. Now, of course, this can lead down a path to somewhere that can be something wrong with all of this and things can go badly wrong at the end of it. But in this case, clearly, it went right. And I would say that this, the finding such as it is has been reproduced in literally millions of subsequent experiments. Now, so so just just kind of gets getting a slightly more sort of theoretical view of what's going on to say, the first observation, which is very widely shared is rep replication is not something that molecular biologists do very much in the sense that I think is assumed in much of this discussion, which is just exactly repeating experiments. But what I've just been alluding to, and my colleague, Stefan Guttinger, has discussed in more detail, is that they do something which he calls micro replications. And they what this means is that the consequently constantly throughout the experiment, they are actually repeating things done by other predecessors, and seeing that they in fact work and do what they are supposed to do. And I will quote here at the end from actually the Hershey and Chase paper, there's the kind of thing you find throughout a paper like this, where they're referring to ways that this work connects with previous work. And in some respect, indeed, the replicates that work, particularly important and good to get talks, particularly about this is the role of positive controls. So a lot of the time, if you actually look at the details of what's going on in this kind of paper, it isn't just that they're using a procedure, they're actually verifying that the procedure is working correctly in the context in which they're using it. And this is what I think has been very neglected in my field, is the importance of positive controls in this kind of research. And this is all part of the process of tying the work that goes on in a particular experiment to work that's been done before, which then is in an important sense, actually being replicated in the present work. Something that I think is really that is telling here is the way that, and this is a sort of critical reflection of where my own field, I know that philosophers, when they read scientific papers tend to jump immediately to the discussion at the end, which says what's supposed to have been shown. It's certainly well established that what scientists typically do is jump immediately to the method section. And typically what they're looking for there is that the experiment has been done right in the sense of using well validated methods, and also where appropriate, which is pretty much always somewhere using the right kind of positive and negative controls. And then just to get to where this all takes me, what I think one sees in molecular biology is the importance of seeing science as a process. And actually, this is a phrase, this title is the title of a book by David Haaland, Biology. Molecular biology can't be seen as a serious as isolated findings that you assess the whether they've been established by looking at the paper and whether it was done in a certain paradigmatic way. Papers fit in with other results and partly replicate other results. And crucially, a lot of bad work just falls out of this process and gets kicked out of the process because it isn't used again. It's ignored. And in case sometimes you see this observation that most work is never cited as an example of inefficiency and as somehow why we doing all this terrible work, but actually arguably that is exactly how it works that the process picks up the work that can be built on used and thereby replicated and the irreproducible and unreproduced papers are the things that turned out to be dead ends or misguided in one way or another. So what we should look at is not the isolated paper, but the growing articulated body of both practice and knowledge. I think we want evidence, you know, illustration of that. We might look at our amazing success at producing COVID vaccines, specifically, particularly the mRNA vaccines, which I would say replicated vast amounts of the most important work in half a century of molecular biology and then went on, of course, to some statistical tests that have produced produce a fairly decisive results. So with that, I'm afraid I probably run a couple of minutes over. So apologies. Great. Thank you very much, John. And we did actually run through the question period. And so I think what I'll do is hand the podium over to David to moderate the next two speakers and give the speakers and the audience a chance to sort of formulate their questions for the discussion at the end. So Sure. Okay. So next up, we have Nicole Nelson, who's going to be presenting some of her research on other reproducibility crisis and how it's been interpreted in a biomedical context. All right. So from my talk, what I'm going to do is take as a given, maybe the idea that there are differences between fields that we need to take seriously, that biology is going to have some unique characteristics that we need to take into account when designing interventions. And my talk will be focused on what it might look like to develop culturally appropriate meta science interventions for different fields, focusing on biology specifically. I'm going to frame my talk around the idea of culturally appropriate interventions because this is a concept that is somewhat familiar, at least hopefully should be somewhat familiar to people in both the biomedical sciences and in psychology. The literature on culturally appropriate interventions gives us three criteria that we can think about in trying to determine whether or not an intervention is culturally appropriate. The first is kind of a high level assessment. Does the intervention align with the values of the group? Is it coming, you know, does it make sense from within the worldview of the particular group that you're focused on? So for something like biomedicine, we might say that the highest level type of value here that we're talking about is something like, do we believe that producing good scientific data is something we should strive for? Probably pretty much everyone's going to be like, yes, good. Then if we go a little bit lower and ask, is the ability to replicate data across laboratories an indicator of high quality scientific data? There you might start to see some of the agreement fall apart. So point one here is thinking about the high level values that are held by a particular group. Point two is getting us a little bit more specific, thinking about the culture by which we mean here the attitudes, expectations and norms of the group, wherein people might agree generally speaking that these things are a good idea, but they think that there may be not as much of a problem as other kinds of issues that they're facing in their day to day life, or they don't expect that the particular issue at hand is actually going to really impede their research. And finally on three, what we should be thinking about for culturally appropriate interventions is thinking about strategies that are aligned or tailored to the existing kind of patterns of work and life within the group that you're working with here. So in clinical research that might be something like, is the particular group here used to individual versus group interventions? Do they typically work through online or face to face modalities? And that will help you decide what kind of intervention you're going to make. So here in the sciences, we might want to think about what are the work practices, the very practical types of things that scientists do in their laboratory, and how do they differ between fields? To ground this, I am going to focus in my talk on the concept of blinding or masking to look at what it might mean to try and make culturally appropriate interventions in preclinical biomedical research around the concept of masking. Masking seems to fit our first order criteria of being something that is valuable or seen as a valuable thing to the group of researchers that are being intervened upon. The citation that I've got in front of you now is a paper from 2012, Landis et al paper, which is an important one here in the history of the contemporary discussion around metascience, replication and rigor within preclinical biomedical research. And it represents the output of a conference that was hosted by the NINDS, the National Institute of Neurological Disorders and Stroke, to try and think about rigor in science and what could be done to improve the rigor specifically of preclinical biomedical research. They came up with a set of four criteria that are kind of colloquially known today as the Landis et al 4, and one of them is blinding or masking. So at first glance, you might look at this and say, All right, masking, that seems to be something that is culturally appropriate design interventions around because it's a value held by biologists, preclinical biologists themselves pointed to this as something that they think that the research community should do more of. In what follows now, I'm going to show you three different ways in which that doesn't actually necessarily work on the ground and why. And they reference criteria two and three and a fourth one that is not on the list here, but I think it's something that we need to be considering. So first, you may have general agreement that the idea of doing masking is good, but the particular expectations within a community or the norms within a community might mean that masking is not seen as the highest priority change that they want to make to their work practices. Here I'm going to be drawing on some ethnographic research that I've done with researchers mostly in cancer biology, asking them about their work practices and why they do or do not employ masking in their own research. So some of the quotes that I have up here are anonymous quotes from those researchers that I've been working with when when asking them about their work practices. So you may have general agreement that masking is in principle a good thing, but that concern may not seem relevant or salient for specific groups because of the expectations that they called for some of the cancer biologists that I spoke with their understanding or assumption was, is that the effect of many interventions that they were making was so large that whatever small amount of bias you might get from not running mask studies wouldn't really matter to the outcome. So in the words of one scientist, you'd have to be deceiving yourself pretty hard to see tumors where there are none. Now, not all experiments might look like this where you have sort of visible results of one group of mice has no tumors, the other has quite visible tumors. And this can be a dangerous position to get in where scientists assume that an effect is going to be large when in fact that effect is small. And so the risk of bias is a lot bigger from not masking. But we should take seriously the idea that there will be some instances in which the effect of the intervention is so dramatic or obvious that it no longer becomes a really good use of somebody's time to think about controlling other smaller sources of bias. You know, for example, the mice get tumors or they don't. The cells live or die. You suppress one gene and the frog doesn't end up with a limb. So there are some instances in which this might not make sense. Moreover, there might be other concerns that are given higher priority within the research community. So when balancing off competing concerns against each other, scientists might say, yes, I agree. If I don't mask my studies, I am potentially introducing some risk of bias. But if I mask my studies, I might have bigger problems, as one scientist put it to me. I'm more worried about the horrors of misidentification, meaning that when you've got mice that are, you know, you're keeping alive for 18 months in a time to tumor study, the difficulty of keeping track of who's who and the chances of somebody losing an ear tag or maybe even getting out of their page, it's much more of a concern than the idea that you're going to introduce some bias by using masking practices. So other kinds of concerns may feel more salient to the community. So again, this is relating to that point to on culturally appropriate interventions is within the culture here, what are the assumptions, the expectations, the norms, it may be the case that the assumptions here really prevent people or work against people's assumptions that an intervention like masking is going to be valuable. But let's say they do agree that it's valuable. Why wouldn't they do it then? Well, in point three, we're asked to consider what the sort of typical practices of that community look like on a really day to day level, how do they get medical care, psychotherapeutic care, etc. Here, what I would ask you to consider is look at what the work practices are like in different areas of science. In clinical research, it's really common for researchers to be working in large teams. So you have a lot of small tasks distributed across a research group, and team research provides natural opportunities for doing masking. So perhaps, for example, you have one collaborator who's collecting samples from patients that he or she is doing surgery on. And then those tumor samples are going to be sent off to a pathologist, and he or she is then going to score them in terms of, you know, what the degree of infiltration looks like, for example, sending the slides over to a collaborator provides a natural opportunity for introducing masking there, because you can simply send them over with a numbering system that that collaborator has no knowledge of. This can be quite different from a lot of how a lot of preclinical laboratories are run, where the relationship is more so that we have a mentor, a PI, who is working with an individual student on their project. And the expectation is that the student is supposed to take that project through from beginning to end and acquire all the skills necessary to be able to execute that project. In that model, it just becomes a little bit more practically speaking, difficult to figure out how you're going to do masking. Some students talked about the need to quote unquote beg for labor from other laboratory members to, for example, do all the labeling of their tubes or cages for them, because if they executed the system themselves, then they would be essentially unmasking the study before it even began. But since that project was the responsibility of the individual student, asking somebody else for their labor didn't fit well within the culture and work practices of the laboratory. Likewise, some students told me about the concern that if they passed off certain experiments to other lab members, that those lab members might not do their best work or produce high quality work because the data that they were collecting was not for their project. So imagine in a timed a tumor study that what you need is somebody to go down and palpate all of your mice for you to see if the tumor has formed. We have to check individually to see what time in the life course a tumor has emerged. That can be a time consuming process and asking someone to do that over the whole course of your study means a lot of time. And you might well be worried that they would not produce their best quality work if they're not the one who's getting a publication for it. And now finally here, I'm going to address the fourth factor, which is really not within the three schema of what constitutes or the three point schema of what constitutes a culturally appropriate intervention. But I think it's a really important one to consider here and thinking about meta science interventions. And that point is the intervention just may be targeting the wrong population, maybe aiming at the wrong people who don't actually have the ability to solve the problem. As an example, one of the things that many scientists talked about as I've been interviewing them about their masking practices and their daily work practices is that the core facilities for housing and managing animals caring for animals within their communities have particular rules, regulations, standard procedures that interfere with masking practices. For example, it is common in many core facilities for a cage cards to be labeled with genotype. Makes a lot of sense if there is a particular genotype that has particular care needs. Or again, if you're worried about misidentification or cross contamination that can spoil years of selective breeding experiments. But putting the genotype on the cage is the equivalent to unmasking an experiment for certain types of experiments. Likewise, core facilities like to label certain key kinds of information such as if an animal has received a mutagen like EMU, so that information is prominently displayed on their card. But that might be something that again unmasks the experiment. Finally, in one case, I am in one set of experiments, what seemed to be the problem was that the work practices of the core facility were to arrange in order a specific order all of the cages of mice, such that mice that had received a viral vector treatment in the experiments were at one end of the room and the mice who had received a sham treatment were at the other. This again physically separated the mice from two different treatments, effectively unmasking them. But the reason that they did that is because cages were always changed from left to right, and so they would change all of the cages from the mice that had the sham treatment to reduce the risk of cross-contamination with any viral vectors from the mice that had received the treatment. So my point here is that the interventions around masking may potentially be targeting the wrong population in the sense that it does not make sense to ask researchers to individually negotiate exceptions to these rules. Some researchers may well kind of bully their way into the core facilities and ask for exceptions to how it is that cages are labeled, but it would make much more sense to target an intervention at the core facilities themselves or at the IACUCs, where it would become easy or routine for researchers to implement mass practices in their work by virtue of the labor practices of the core facility being aligned with this particular goal. All right. So to sum up here, I wanted to point back to a recent op-ed that David Peterson has published in Nature, where he talked about some of the interviews that he has conducted, looking at what interventions scientists appreciated, lauded, and which ones they thought were misguided. And what he found was that interviewees praised cases of overhaul related to rigor and reproducibility that originated in the community that they applied to. Now, that's a message that's been reiterated a couple of times already in this panel. And I would put forward to that that is because when the interventions originate in the community that they apply to, they're almost, by definition, going to be culturally appropriate, not necessarily. As you can see in the case of the LANDS-IV, the LANDS-IV may have come from preclinical biologists, but they don't necessarily apply to all areas of preclinical biology. This means that basic knowledge of the values, norms, labor structures, and needs of the target communities is really information that you need to have before you start designing meta-science interventions. And I would add a caution here that meta-scientists should be careful not to treat their own experiences as representative of the community as a whole. Because many meta-scientists have been trained in a particular discipline and maybe have been moved into meta-science or kind of doing meta-science part-time in addition to their own work. There is some tendency or some risk, I would say, for meta-scientists to draw on their own background of research, how I did it when I was a graduate student, what my lab looks like and is organized like, and to use that knowledge as a substitute for more comprehensive knowledge that could be gathered about cultures and communities. One person's experience should not be taken as representative of the whole. And then finally I'll say that qualitative research techniques like ethnography, like semi-structured interviewing are the things that are designed for trying to access culture, subjective understandings, unarticulated assumptions. And so I just want to put a plug in to encourage people to be thinking about finding collaborators or getting some training in qualitative techniques that are especially helpful and appropriate for getting at the kind of information that would allow you to make culturally appropriate interventions. I will leave it there. All right. Thanks so much, Nicole. That was terrific. So we are running a little long time, but I'm just going to ask one quick question to you, which was it seemed like the the people that you spoke to framed the other resistance to masking in terms of how much it cost or the convenient was less important than other things. It was, you know, required significant alterations of current practice. And I can see from the perspective of a medicine activist that all of these things are kind of just foot dragging and that, you know, we want to, you know, produce the best science we can. And so I'm wondering if you got a sense that there was it was just kind of cultural inertia or if you got a sense there genuinely were over real reasons why there was not, you know, greater movement to embrace the masking behavior. Yeah, this is where I think that we might need some data to test those assumptions, basically, because if it is the case that the risk of bias that is introduced from not masking in particular kinds of experimental systems is much larger than researchers assume, then great, write a paper to show them that that might actually really help move the needle on some attitudes. But keep open to the possibility that may also be the case that the effect sizes in some areas are going to be so large that it actually doesn't really make sense to push this as being the one issue because there are always many kinds of risks of biases that are going to be competing for people's attention and that idea of misidentification or getting things wrong, we've seen as well in other fields, you know, Excel errors, like Excel workbook errors being one of them, but those are also things that introduce bias and incorrect information into the literature. So I would say it may be the case that they're wrong about those things. If so, put some data on it. It may be the case that they're actually right, in which case we need to be thinking about best use of resources rather than just pushing dogmatically a couple of criteria that need to be satisfied always like the land is for. Terrific. All right, so with that, we will move on to our final speaker, Bob Webber. Oh, you're still muted. I will talk at a much different level. I'm much closer to the ground than some of the previous speakers and many people at this meeting. I'm a very practical person. I've been at MIT on and off for 60 years, and the MIT mascot is a beaver which builds things, not a mountain lion or any other heroic animal. To give you a view of how my brain works, I built a whole house with these hands. I did the plumbing, the electricity, the roofing, the sheet rock, and even built rock walls. So I'm telling you that because that's exactly the way my brain works when I do biology. It's the same part of the brain. And therefore I view things in a very concrete way, a way which has no philosophical pretensions and a way which actually doesn't use that much statistics. To echo something Steve Goodman said, I teach a people in my lab that if a result is so marginal that it requires statistics to prove its credibility, then one might look at it as scans. I also tell people that when they develop scientific arguments, they should think of the four letters, K-I-S-S, that they use in the Army. Keep it simple, stupid. Either something is logically intuitive and develops itself or it doesn't. I, given that background to how my mind works, I'll give you two anecdotes for my own existence. I'm currently in the middle of finishing a third edition of a cancer biology textbook that I began writing around 2005 that appeared in one edition, let's say in 2007, another in 2014. And now it's gonna come out again. It's used very widely. To teach students the basics of cancer biology. Why am I mentioning that? Because I saw not so long ago, three or four or five years ago, there's a reproducibility crisis in biology and that 90% as some people claimed of what's published in the literature is not credible and reproducible. So I calibrated my own experience in writing these textbook editions, 700 pages in each edition approximately. And how did it work out? Well, I can tell you without exception, everything that was in the edition of 14 or 15 years ago is still regarded as being true. It's still accepted. It's not as if a significant proportion of the results has been proven to be nonsensical and have been dismissed by the community of cancer biologists. Many of these results may be thought of as less interesting, maybe a minor distractions, just not worth focusing on, but not untrue. And so when I hear that 90% of cancer, of the biological research is irreproducible, I asked myself, what's going on there? And maybe that 90% number as uttered by some and I believe the pharmaceutical industry is more a reflection of them than it is of the people who actually conducted the experiments and published the papers. I don't know, but it's simply untrue that the foundations of science don't change with time. They do, but it's also untrue that they're built on foundations of sand. They're rock solid and as time evolves, one develops more and more degrees of consensus about certain fundamental and accepted truths and work that is irreproducible is simply falls by the wayside, as was mentioned before. That's one of my anecdotes to give you a insight into my profound skepticism about the reproducible, reproducibility crisis in biology because I really question whether it exists and the motivations of those who are proclaiming that it's a major problem in modern biology. My second anecdote comes from the fact that I was involved indirectly in an effort about four or five years ago where a group of people undertook to reproduce a series of 50 papers, largely I believe in cancer biology, in order to really test whether they were really reproducible or not or whether these were papers that were simply figments of the author's imagination or were dry lab or for whatever reason were published irreproducible and unreliable results. And I was most flattered to have two of my research papers, i.e. papers coming from my own research group included among the 50. And I thought, gee, this was a very interesting idea until I began to look at the details of the people who, of the scheme that people wanted to pursue in order to test whether these different papers of mine and other laboratories were actually reproducible. I will tell you that in my laboratory people come into the laboratory, they'll spend several years developing biological reagents, expertise, they'll assimilate a complex conceptual structure, they'll develop implicit methods of logic and rigor and understanding of what control experiments are and negative controls. And by the time they've been in the lab for two or three or four years, they then become highly competent to conduct experiments and may actually publish a paper, God willing. What were these people who wanted to reproduce the work in my laboratory going to do? Well, they were gonna come on the outside, they were gonna deputize a delegation of biologists from God knows where. And these people would not have the training that I subject the people my lab to, they would not have access to the instrumentations, they would not have the subtle experience with how to use different kinds of experimental procedures. They would be in one sense oblivious to many of the subtle details that are required in order to make the most modern molecular and genetic and cellular experiments work. Moreover, what were these people doing in their real lives? Why did they have so much spare time that they could devote a lot of it to trying to check on what was going on, what was going on in my own laboratory, were they just sitting around waiting for something interesting to do and were they actually motivated finally and most importantly, were they actually motivated to find that a paper was reproducible or irreproducible? Where did their motivations come from? And so I came to conclude that this entire effort was an act of lunacy, cluelessness. I had no idea where these people were coming from. How on earth could they presume to try to reproduce work in my own laboratory by people who had no, none of the expertise in facilities which were totally different from those in my own laboratory. Mind you, I invited people from that effort to come into my laboratory and spend six months trying to reproduce the experiments which to my mind would be a fair test of their reproducibility. But of course I never heard anything. That experience has instilled in me a profound skepticism about those who would look at science from the outside or biology from the outside and attempt for one or another reason to clean the Augean stables to remove all of the filth and the irreproducibility from the corpus of scientific information by trying to show that most of the published literature simply is not reproducible and is highly fallible. I'm very skeptical of those people because of the two anecdotes I've just told you. And to echo something which I believe Nicole Nelson just mentioned, at least obliquely, to the extent that there needs to be reform in my discipline and every discipline can stand a reform, it should come from within from people who were practitioners of experimental biology who know the difficulties of carrying it out who understand the challenges in making reproducible conclusions rather than people who would come in from the outside bringing in, if I may use the word philosophical issues about how science should be carried out rather than wrestling as people like I do with the nuts and bolts of everyday experimental science, how it's carried out, how it's limited and what kinds of improvements it would stand to have. So you may sense from all this that I'm a little skeptical of meta science not because I think I live in a perfect universe but because I live in an environment, a scientific environment of very practical people, nuts and bolts people, does this work or does it not work? Is this result, does it hit you in the face or is it borderline credible? And I have yet to see how some of the philosophical ingressions into my discipline are about to improve some of its fundamental flaws. So how's that for being a counterculture to what has been discussed here in depth at this symposium? Pam, was that adequately controversial for you? Provocative? It was great, I don't know if the other members of the panel disagree with you as much as you may think but David, why don't you go ahead and segue into the panel discussion then? Sure, let me ask just one quick question for Bob and then we'll move into the general discussion for everybody. So I have sort of been putting my role as a devil's advocate for this panel and so I could see a critique that one of the arguments that the scientists have is that the internal evaluation mechanisms in some areas of science have broken down in a way that has let irreproducible findings fester and proliferate. And so I'm wondering from your perspective if you think the internal control mechanisms in biology have proven effective and whether you think there's something unique about biology that has allowed them to root out a negative findings whereas in other fields they, for instance, psychology, where I think there is quite legitimate an issue there. Well, David, just to clarify, when you talk about internal evaluation mechanisms, I presume you're not talking about the evaluation of people's careers as much as evaluation of their scientific opus, their output, I presume that's the case. And the fact of the matter is that results get published, people read them if they're really interesting, then other people follow them up. If they appear to be irreproducible, then they just kind of disappear. And after a matter of several years, nobody ever cites the paper. They move off the radar map and they don't necessarily accrue to the good reputation of the people who published them to begin with. And by the way, Steve Goodman may be amused to know that the paper by bizarre coincidence that he mentioned is a paper I happen to be intimately familiar with. My lab works in exactly the same micro area of cancer research. And there are questions there about the reproducibility of that particular paper. Interesting, interesting. So it's not as if I think scientists are infallible. It's simply that with the passage of time, a consensus evolves that one result is interesting. The other one cannot be reproduced by others, whether or not it's been identically replicated or not. And it simply is not worth one's attention. And by the way, one final thing, when I wrote this textbook and I am finishing it now, I didn't necessarily spend a lot of time reading scientific papers. What did I do? I got an email on the telephone and I asked people, what was their opinion about the evolving consensus of this issue and that issue? And that's the way, in my mind, the way that an evolving consensus works. It may not be as methodical and logical as some people would like, but the fact of the matter is, it's extraordinarily useful and interesting to get people's opinions as to what is credible science, what is borderline credible, and what is frankly flawed. But again, it's not as if everything in my 2007 textbook edition has gone out the window. In fact, nothing has. Some of it may be uninteresting and trivial or distracting, but it's not wrong. Okay, so on that note, I will open it up to the panel. In the last few minutes, I want to alternate over to audience questions. So if the audience wants to get in some questions right now, I see we already have some good ones, but I want to give other panelists a chance to talk to each other. We've come at this intersection of biology and meta science from statistics and philosophy and biology and social science. And so I'm just wondering if you guys have any thoughts or comments or questions on each other's presentations before we go to the audience Q and A. I'd like to just make a few comments or slash questions. One is, I think sometimes we don't make a distinction between an individual experiment and what we might call settled knowledge. And the processes by which we come to what we might call settled knowledge are sometimes profoundly different than the standards we apply to any one paper. For example, just really trivial in the big move that I talked about that happened in the 80s and 90s in clinical research was moving from first of reliance on just mechanistic reasoning to more to clinical trials, but then not looking to any one clinical trial is the answer, but saying that it's the amalgamation of many. And while we know absolutely for sure that any one study can be wrong just for reasons of chance, the meta-analytic average in those questions that can be meta-analyzed with a lot of those studies, that is a consensus that's much harder to move. Now, maybe it's tautologic, maybe it's because we believe in the sum, but we really have to be careful about distinguishing between the components of what we might call settled knowledge and the final conclusions of science. And I think in many sciences, those units are very visible, like in psychology. Maybe it's less true now, but for quite a long time, and David can comment on this, one study often established a theory and it became sort of branded on the basis of just one study or maybe very, very few. And it was immediately accessible to the community and to the public. That's a very, very different paradigm than what's going on in clinical research. In fact, when I first heard about it, I almost laughed out loud. I said, who would think that this is a sensible way to go? Because obviously those are gonna be quite frequently wrong, particularly if you depend on statistically significance. Statistical significance. So this distinction between settled knowledge and individual components and how they're all put together to create what we call a body of knowledge, I think is super important. One other just anecdotal, if we're talking about anecdotes and this relates to Nicole's presentation, I, there is a faculty member at Stanford who was head of research at a major farmer company and he moved over, became again an academic. And I asked him, what's the biggest difference between how you operated in pharma and how you're operating now? And he said, and this relates directly to your presentation, he said, nothing was done in pharma without blinding, without masking. He says, you just didn't even talk about it. He says, here we don't do that. And I said, why? He said, well, in a sense, and this gets to your point, the infrastructure made it very difficult. And sometimes the infrastructure, and you made this point that the way you house the animals, everything is organized. The infrastructure is organized around the belief that something makes a difference or doesn't. And it makes it very, very hard to go against that. And I was, so I said, do you use it? And he said, interestingly, not much, even though it was his life before and he actually believed it to be quite important with animals. I'm gonna pass it on to John in a second. I saw you had a question, but I just wanted to ask one quick thing, one quick response to that, which is I think the interesting thing about the comparison you made about the single paper versus the growth of knowledge in a kind of meta-analytic fashion is that both the article that you cite at the beginning of the paper and what Bob talks about is that there can be serious qualitative differences across these laboratories, these biology laboratories. And this would seem to undermine a kind of meta-analytic move toward truth that essentially one very skilled experimenter or one particular setup could accomplish a goal that no other lab could. And meta-analytically, it would look like a false positive, but in fact, that's the innovation, that's the finding that actually changes the field. And so I'm wondering about how this makes you conceptualize this intersection of statistics and experimental biology. Are you posing that to John? No, sorry, I was posing it back to you. John's question is a second. I just wanted to respond to yours really quick and just ask you as a statistician, like how do you think about that? So I think even in the areas that I know best, the definition of what is quality is very elusive. It's very elusive. Even with as rigid a structure as randomized controlled trials, we still do not have complete consensus on exactly how to distinguish the best that in some ways should be included in the systematic reviews versus the ones that shouldn't. Every study is different from every other study and which differences matter and which differences don't are often very contextual. And there's been no scoring system or metric that is completely accepted. I think this is the core issue. This is what science is, is figuring out what is the highest quality evidence. And even when we have as restricted a technology as we have in our RCTs, there's some components that there is consensus about but within in context, the deviation from those ideals and how far you allow the randomization to deviate the masking, all these things as Nicole was saying, it's always subject to debate we find. So I can't answer your question simply except to say that this is the central challenge not only clinical research, but of all biology. So obviously can't be answered in a few sentences. John, you had a comment or question? Yeah, I just wanted to make just a quick comment on Robert's presentation, which I guess I agreed with 90% plus of entirely. But I guess a couple of things I do want to say. I mean, I suppose I think you perhaps use the word philosophical in a way that was perhaps erogatory. That was the word I was looking for. And I guess I would say, I would just say in defense of philosophers that at least it doesn't philosophical critique of science and in the technical sense of critique rather than just criticism doesn't actually have to be ignorant. And I think philosophers nowadays do tend to spend a lot of time trying to understand the science they're talking about. Now, of course, I would say there's two different things here. The expertise that a scientist, a practicing scientist has who does good work in a laboratory is quite different from the kind of expertise somebody has who can engage with what do they do externally. But I do think external engagement is possible without becoming a competent or a skilled experimenter. And I think that's quite important. So I would disagree with your point that all criticism has to come internally. I would say it has to be informed and that that's quite hard to be. I'd also say I think that one thing I have to say partly why I agree with so much of what you're saying is that molecular biology is a field that really no sensible commentator could think was indeed trouble. Indeed, I mean, obviously this is why it's a sort of field that philosophers look to to try and understand what works and what makes a successful scientific field. But I do think in this kind of project one has to be quite fine grained in talking about different bits of science. Biology is a huge array of different things. So for example, if I were to move to the other end and something that I think is much more appropriate target for some of the kind of criticism I think is going on in this meta science movement, I would look at the kind of the GWAS epidemic. And the problems, I think there again, one has to not just look at the sometimes bad statistics but look at some of the details. So it sometimes seems an awful lot of the GWAS movement, you know, the genome wide association studies is based on a largely discredited idea that there are interesting correlations between any phenotype and a bunch of genes even when you've got to the point of recognizing that that might be a tiny effect for several thousand genes. It's not, I think we can explain why those aren't recticable because there's just not a phenomenon there. The contextual sensitivity of the phenomena is such that you're just not going to get samples that are appropriately replicating one another. So I mean, there are bits of biology that are very unlike the bits that you were talking about underneath that I was talking about, which I say should concern for most of us a paradigm. And if people who don't think it's a good paradigm, I guess are the people who are in hospital unvaccinated at the moment. I cannot disagree with anything that John Dupre said. So there. That was silenced me. Thank you. Do the attendees have any more comments or questions to each other or should I move on to some audience questions? I'll just say just in general, most of these things are not black and white. One can say that biology can improve without saying it is all wrong or it's all right. I think there's a tremendous amount of ideas and tools that are discussed in this conference and that have been brought to bear that can improve it. But I think the point has been made that it's very helpful to do that in alliance with the researchers in the field. Well, let me, that's a good transition to ask a kind of general question about a specific interventional technology which is the mass replication. So the Center for Open Science, they conducted a mass replication of psychology before they moved on to the cancer biology of which Bob spoke earlier. And I am curious if the goal of this was obviously to do what they did in psychology, which is to find a general level of reproducibility to find out if as a whole, the field is generally producing findings that can be reproduced. And I think statistically there are some issues with this and there are some practical issues which brought up about actually having the skills and technologies to do it. But I'm curious as a group, how people feel about this kind of technique to evaluate the status of fields, whether this is a kind of valuable technology and if it is more appropriate to certain fields and others and what are the kind of characteristics that determine its usefulness? I would just say, just to jump in, David, that for me, being a very pragmatic, practical person, proof is in the pudding. Is there a concept that with the passage of time gains traction or with the passage of time has become increasingly a questionable? Of course, that raises the issue of what the scale of time is because there's obviously concepts in psychology that have lasted decades before they ultimately were undermined by a countervailing points of view. But again, proof is in the pudding, do lots of people increasingly embrace it because from their own experience, that supports this by now widely embraced paradigm. Either it does or it doesn't. I think from a critic of psychology, I could say that that's something like priming effects which was considered a very hot area for maybe 10 or 15 years and has been deeply undermined. Those things became hot, I think, because you could embrace it and you could do research in the area without actually having to replicate the initial study. And so this was one thing I was wondering about in biology, because it seems like to do, like you said, I mean, the proof is in the pudding. So to continue a line of research requires that you adopt that technology, extend it in some way. And that seems to be lacking in some other fields. I think Pamela wants to up in here. Yeah, if I can dip it in that one. I wonder if some of the discussion gets stuck on an equivocation of sort of what the product of knowledge is that the research is aiming to produce. And I think there may be some fields where simply finding a statistical association between two variables is the final product. That's the final claim that they care about. Whereas in a lot of biology research, what they're trying to do is come up with sort of refine a mechanistic model for how things work. And any one experiment that gets whatever size of fact for whatever p-value is just like one loop of Velcro out of thousands that are holding some model together. And it's the model that Robert is saying seems to be durable and reliable rather than like the individual research finding which might not work in some other lab if they have different labs, different mouse strain or something. Sort of at what level is the knowledge evaluated? Could I just ask you to say, are there any, if you give an example of a field that you think is legitimate and worthwhile for which there are just findings of the kind that all they aim to do is find associations between variables? I didn't say legitimate and worthwhile. But I'll try. It's a leading question. Yeah, Steve was saying that he was surprised in some social sciences that people could do one experiment and find an effect and that becomes a textbook fact. I'm not familiar with those examples and maybe Steve could mention one he had in mind. But I could imagine in some fields, like I don't know, epidemiology where you can't do an intervention experiment, the situation will never be repeated. And so the best you might be able to do is figure out, I just need to know what's the probability of getting this disease if I take this vaccine as accurately as possible. And that is the end product that I want to know. Like I don't care how it works. I just ultimately need to base a decision on some estimate of the chance of something is a different kind of knowledge than the sort of model building that a lot of basic research is oriented around. But yeah, but a lot of training methodology, knowing stuff about vaccines and so on. Sorry, Nicole was- Well, I was gonna make a Steve Goodman point for him just to point out that the cancer project, the reproducibility project cancer biology is again, not something that is new per se. It's new in the institutional format that it took. But two of the key papers that contributed to the current conversation about reproducibility and rigor and biomedicine were both coming from pharmaceutical companies whose routine and regular practice was to replicate results in-house before deciding whether or not to move on with those. Now, what exactly those practices are testing is open to question. I wouldn't say necessarily that they're testing the truth of a result. They may be testing the generalizability of result of the ability for them to make it work in their own hands. It is still possible that involves a highly specialized skill that an industry scientist doesn't have. But this is a practice that has been routinely used in many cases for people who are deciding whether or not they wanna build on their results. And your own research has really nicely shown that too, that while these institutional projects of replication are new slash rare, everyday replications are happening all the time. I think we need to move to our audience. That's a really good point. But there's so many good comments coming through the chat. I wanna make sure that they get heard. Can I nominate one that was answered by Nicole in typing, but I thought it was worth having aired in the live session was somebody was concerned. Well, if every discipline gets to be its own police and decide what are appropriate for it, doesn't that give them sort of a free card for a bad discipline to just like get itself off the hook? And I just thought that was a point that's probably on a lot of people's mind. Like is this just a cultural relativism argument? Or so that point, if you ignore the dynamics of what operates within the discipline or a sub-discipline because there's lots of motivations for the discipline of police itself. And when people come up with a certain claim of a result, there are other people who attempt in one way or another to disprove it. And so it's not as if the a sub-discipline doesn't have its own internal dynamics, which continually probes the correctness of certain well-accepted truths and either supports them or further erodes them. And it doesn't really mean that that discipline beers off into some space of unreality. It just, that's not the way the real world operates. I want to say something about that and this relates to motivations. When folks are coming in from the outside, they don't live with the consequences of destroying the credibility of the science. And the reformers from the inside often are motivated by, they are the most motivated to improve their science because they live, they want to make it stronger. The reformers, the young Turks that made up the Cochrane collaboration in the 90s were incredibly idealistic. They wanted to reform their own discipline because they felt their own career, they were, their field was at stake. So I think you will find that it's often young scientists within their own disciplines who are highly, highly motivated not to just find fault, but to make their own science as solid as it possibly can be. So I think that we learn a lot. The thing is there's information sharing here. We do learn a lot from the methodologic struggles and faults of other disciplines. But how that knowledge is applied within your own, I think you have to do it in partnership. I really think this is a partnership of people who have a stake in creating the reliable knowledge within their own discipline and not just finding error. A lot of meta scientists feel a sense of failure if they haven't found weakness or a problem in the science they're studying. And it's, I want my science to be successful. I want science, I want my science to be successful. So that's what I'm saying. And that's shared by many virtually all scientists. About the own biases. Yeah. And so that's shared by virtually all scientists. We're all in this together and we want to make our science the strongest it can be. Which is why I think the partnership is so important. There is a lot learned from other sciences. I think this is really, this is the most important part of the meta science movement is that there's common issues but exactly what form they take in different sciences needs to be worked out carefully and in partnership. Yeah, I'll add that I think it's really helpful actually to assume good intentions and good commitments on the part of the scientific communities here under question. And I think that's one thing in the meta science conversation which has maybe been up for grabs with the discussion of incentive structures in science and how scientists are disincentivized to use these or incentivize to use these core practices disincentivize to do good science. I think that might be an overly skeptical view of what most scientists wake up in the morning to do. Most scientists are probably not thinking, well, this is pretty bad data but whatever I just want to get it through because cynically I need promotion tenure whatever grant. There's probably some edge of that in places but by and large there's a sort of general commitment to the idea that you're producing something worthwhile something good. So in the answer that I typed out in the text box I said, I think it was worth making the analogy to health or health care here to think about most patients generally speaking as valuing their health. And so if an intervention isn't being followed probably the first place to go is not to say, ah, that patient is making excuses. It's to say, okay, well, why isn't this working? Is there a practical reason? Is there definition of what it means to be healthy just different? And you're trying to achieve an aim which is not actually relevant for them. So I think while a lot of meta scientists are maybe wary of assuming good intentions because that assumes them that nobody's going to be motivated to change, we have seen an awful lot of change once people see data that demonstrates that for example, some common practices that have been used in various fields actually do incur a substantial risk of bias. And in psychology we've seen changes quite dramatically. And with people responding to that data because they do ultimately want to produce good quality work. I think also, I mean, there are different kinds of criticism and not all of them are going to be attractive even to the young Turks in a science. I mean, if you look at the history of science there have been a lot of scientific projects that have just been misguided sometimes drastically so. And it seems to me we do want to acknowledge the possibility of external criticism that says a science is not actually going anywhere good and maybe going somewhere very bad. And I don't think there's any motivation for practitioners. I could mention some current scientific projects but probably it wouldn't be appropriate here that I think may well be on that kind of a trajectory. And I think it's certainly an appropriate point of discussion whether some particular science, as I said, there are some that I think nobody would seriously think are in that predicament but some that might very well be. David, you're muted. Yeah, sorry, we have a couple of questions that are related to on a similar line. So anybody in the audience can look at the anonymous attendee and Moneo Baker's a question which I think is essentially this, both Bob and John have laid out a version of scientific progress which says that the cream will rise to the top, that work will be produced and the most effective, the most insightful, the most robust findings will be that which continues and other work will simply die out. And I think a big critique of the meta science movement is that one, that this isn't necessarily true, that there are irreproducible or bad findings that manage to be cited in the literature decades after they come out. And two, that even if that's the case, shouldn't we be looking for a more efficient, effective way to self-regulate, to self-police? And so, yeah, I mean, I'll put it to the panel whether it's more dangerous or whether there actually are some positives to thinking about more explicit mechanisms of self-regulating scientific production and evaluation. Could the very word police sense shivers down my spine as if there should be authorities within each discipline to determine what's acceptable and not acceptable? I don't think that every result that is accepted over an extended period of time is necessarily going to be accepted forever, including their results, which with the passage of time are viewed as being having been foolish and misguided. By my view, in general, there is a market forward progress, let's say in preclinical cancer research. Yeah, but I think- I want to say- I'm sorry. I'm just gonna say quickly, what I've known from every field that I've looked at and that others have looked at is that the sufficiency argument does have, I think does have weight. And there are many things that we accept that when you look at the evidence from the beginning, probably shouldn't have been accepted. They, those theories themselves inhibit the forms of inquiry. And this is why it's not black and white, exactly how to measure the pace and efficiency of progress. I think in biology is particularly difficult because of this complex dense network of building. I know in clinical research, we know absolutely that there are probably quicker ways to get to more reliable truths through the applications of many of the things that have been learned. I think, you know, this is why meta research, I mean, we're talking most, a lot of opinions here. And I'm parceling off animal research. I'm really looking at the basic science mechanistic research. I think we need to do a lot more meta research in this field to figure this out. That this is, we're really at the beginning. That's why I put that timeline at the, in the clinical research realm where the designs I would argue are much simpler, although the judgment's still difficult. It's been 30 and 40 years where we've built up an evidence base to have confidence about what makes a difference and what doesn't. And we're still arguing. I think we're at the very beginning in biology. So we can assert things. I'm not so sure they're necessarily true. I do think a lot of bad results get not built on and they fade away, but exactly how efficiently that happens. And if it could be done a little better or a little faster and whether that would make a material difference and wouldn't be attacks on the field. So it wouldn't have unintended effects. I don't really know. I think this, we're at the beginning of this and we should encourage these sort of inquiries done in partnership with a deep knowledge of what's happening. That's all I'll say. I just wanted to add to that, which I think largely, largely in agreement. But I mean, efficiency is an almost indefinitely slippery concept. And I mean, we only have to look at economics to see how dangerous it can be. And I guess that the thing is that you really have to understand the process very well to decide whether it's efficient or not. I mean, you have to understand how it works, what the outcomes are supposed to be and what the valuable outcomes are. And it's so easy to do this in very naive ways because the objective of science is not contrary to the views of some university administrators is not to generate papers or even citations. It's to generate understanding and ultimately technologies and so on. Now, I don't think we have a deep understanding of what the most effective way of producing those sorts of the sort of outputs we ultimately want may be and if it turns out that vast quantities of unsighted papers are a byproduct of an efficient process, there would be nothing to be surprising about that. It may not be true, but certainly I don't see why it might not be true. I have a response to what Steven and John was just saying, which is if there isn't an inefficiency, I'm not sure if it's all on the part of people conducting inefficient research projects. So maybe there's more inefficiency going on with people who are consuming that research, not using sufficiently critical judgment in deciding how confident they should be of a claim on the basis of the evidence that they've seen and maybe jumping too fast to, let's say try a technology when the evidence in the literatures does not sufficient to support that. So I guess what I'm thinking where I'm heading with this, which I know at least one or two of you has something to say about is whether there's room for improvement in just the education process of how people understand what they can conclude from the data they've seen. If an expert like Bob reads the literature or the ones that he consults when he writes his book is able to ferret out which are the things we really should believe. Is that the skill that we need to be teaching better? I do think there's a difference between papers that are meant to be communications among scientists and findings that can go immediately into practice and to the broader community. That's one big issue in clinical research. Some of these results go, they're in the headlines the next day. And that can be a huge problem. Papers that go from scientist to scientist where there's more critical acutely acumen and the purpose is really to plan just the next experiment. I think sit in a different class. But we should sometimes distinguish between those two. I do think that we have to be very, very careful though of thinking that our science scientific methods are the best they are right now and that the filters we have built in necessarily can't be improved even if they have the appearance of garnering consensus. Because that consensus may itself be a reflection of belief in methods that may not be that reliable. I mean, just that little example I gave about the growth rates. You can just imagine the inefficiency in the literature built on those assays with people not understanding the interactions between the drugs, the cells, the counting methods. Again, they said it themselves. They thought that these measurement technologies were so exchangeable, were so simple that it didn't even occur to them that they needed to be systematized in the way they did. And I think that kind of paper, and if that was done on a widespread basis, it's hard to imagine that that wouldn't increase the efficiency of science to scientist communication. Because otherwise you have this welter of results and it takes years to sort that those differences out. Because you don't know that it's just the imaging technology. A lot of this is just about measurement technologies, not just study designs. I think it's a good segue into Catherine Axford's question who asks that she says, I appreciate the emphasis on how constructive perform work can be if it comes from within a field. But she asks, how should fields empower people who want to spearhead those efforts if people within the field actually want to, want to embrace some of these meta-research findings, actually want to intervene? What needs to be done? Are there kind of sufficient opportunities right now for them to engage in this? And if not, what needs to be done? I think that biology is such an early stage. The focus has to be partly on training and partly on inspiration, meetings like this. If they're attended by biologists, hopefully they'll go back to their labs and start thinking, training absolutely, cresting your own methods and figuring out how to probe them in the way that one paper did and I think many other papers do. There are many, many models for doing it. I think maybe less in biology, I'm not sure, I just don't know. But I'd like to hear others. Yeah, I think this is a really good question and one that really requires some sort of sociological thought. David, I'm actually interested to hear what you have to say about it. But I think that this is where the incentive structures really do become relevant because how much value is placed in this kind of methodological work really varies between fields. And this is something that we've seen science confront in other ways. For example, in the early days of the Human Genome Project, researchers realized institutionally speaking that if there was gonna be a whole lot of kind of day-to-day sequencing work that seemed like a grind and didn't intellectually produce stuff, they were gonna have a hard time allocating credit and getting the researchers to actually build careers off of doing this stuff. So in so far as methodological work continues to be seen as sort of service work, grunt work, not interesting, it's gonna be hard for people to make these kinds of reforms. That's a little bit separate from the point that Katherine is making here with respect to people not liking their individual findings challenge. I think that one is a little bit more tricky. What I would suggest is to try and find ways to not necessarily make the culture more amenable to kind of attacking particular papers, but make the culture more amenable to doing methodological work, which is thinking about measurement tools, techniques as being valuable stuff and not just the stuff that gets you to the flashy findings. If I could make a totally disinterested point, I think that one suggestion would be that scientific training should include a little bit more the history and philosophy of science where people have a little bit, acquire a little bit of a tradition that existed, as I said, a little bit even before meta science of reflecting on what science is and how it works. And maybe to seem a little bit more disinterested, I mean, just at least some history of science to realize that people didn't always know everything that they know now. And in fact, they believe lots of false things that were useful in getting them to where they are now. And some very, very simple messages that come just from knowing a little bit of the history of their discipline and then almost inevitably reflecting a little bit about what it means, which is to say, getting into a bit more philosophy and generally encouraging perhaps scientists to include thinking about what they're doing as a more integral part of what they do, or at least some of them, some of the time. Sometimes. I'll follow it up on John's comment. I think about one of the things that I think would be very helpful is, and I think that would help generally the spearheading of some of these reforms is for meta science and for people in general, I think to have a slightly more nuanced take on what things like reproducibility mean. I think the media has done kind of a disservice to the meta science movement by kind of breathlessly reporting a lot of the irreproducibility is basically being false findings. And I think that really sets a lot of disciplines on edge and I think creates a kind of antagonism that I think someone like Bob can speak to where it didn't feel like a kind of collective like positive effort, more like an attempt to sort of undermine something. And I think some of the people that I've interviewed have spoken about that. And so I think we need to have kind of a more collaborative orientation and a more a richer sense that non-recusability isn't necessarily a sort of thumbs down on your work but could actually be a potential to undiscover or to discover some of the kind of hidden variables that have the paper that Steve has showed earlier can show us. Indeed, the media deserves much of the credit for undermining the credibility of biology since science reporters will run off the most minor result reported as a big breakthrough and then only weeks and months later it is realized that in fact it really wasn't so consequential. And that breeds an enormous amount of skepticism about what we as scientists do in the mind of the public. And obviously there are such dramatic swings in media coverage of different aspects of science that in fact, the laity generally believe that scientists don't know what they're doing and where they're going. But I do want to point out that reproducibility in the sense you're talking about it makes a lot of sense for deterministic systems but not for ones with a lot of noise. And in fact, even as the psychology replication project showed, a lot of so-called non reproducible findings were completely consistent with each other consistent statistically. So to your point, there's a lot more consistency usually what they're showing is that the effect sizes is a little bit more modest than initially indicated but a lot of so-called non reproducible results at least in a statistical sense are completely consistent with each other. One is as you said David, they're not falsifying each other. It's time for me to make one final wrapping remark. I just want to thank all you guys for coming to talk and I think this is the kind of conversation we need to be having more often because there's people in all these different disciplines who are all on the same page trying to achieve the same thing which is to give, improve science and defend the credibility of good science and help each other with our methods. So hopefully these conversations will happen more often. And thanks again for all the panelists. Thank you.