 Well, good morning, good afternoon, good evening and welcome to this first session of the second day of Meta Research 2021. My name is Malcolm McLeod from the University of Edinburgh. I'll be moderating this session, Lessons for the Lab, How Meta Research is Shaping Basic Biomedical Research and I'm grateful to all of our speakers for agreeing to participate and particularly to Manoj who has corralled us and brought us all together and you heal from Manoj later. So there's myself, Alex Banach-Browne, Manoj Takujiuzi from UBC and Tracy Weissgerber and I'm just going to say a few words by way of introduction before they go into talks. We'll have questions after each talk and then time for questions. I hope in discussion at the end if you could put your questions into the Q&A slot and we'll try and keep an eye on the chat as well. So here are my conflicts of interests. I'm academic coordinator of a European Quality and Preclinical Data Innovative Medicines Initiative which is an industry-academy partnership. I'm a member of the UK Drugs Regulatory Body, the Commission for Human Medicine, which is our equivalent of the FDA and I'm academic lead for research improvement and research integrity at the University of Edinburgh. So you'll be pleased to hear that I believe that everything in Edinburgh is brilliant compared with everywhere else. I don't actually but if you detect bias that may be where it's coming from. So let me take you back to the beginning of my journey about preclinical meta-analysis. So it's 2003. At that time there's around 5.6 million strokes happening globally every year with a mortality of 20%. The only treatment we've got, clot-busting treatment with tissue plasmid is an activator is time critical and to get to that treatment you need diagnosis, you need to be taken to hospital, you need brain imaging all to allow the infusion to start within 180 minutes. So perhaps unsurprisingly there wasn't much thrombolysis going on. And the question we had was could we identify a safe treatment where it wouldn't be the matter of the diagnosis was wrong that we could give to patients in the back of an ambulance. And here's an example of something that might be safe. This is low-dose glutamate and homeopathic carnicamontana tested here in an animal model of stroke where the blood vessel on one side to the animal brain is occluded. Some of the animals get treatment, some of the animals don't and what you hope to see for an effective treatment is that the volume of brain injury shown here is in foreign volume increases as you give more of the drug. Of course, those of you who've seen some of my talks before may recall that half-dose low-dose glutamate is 10 to the minus 120 molar and full-dose is 10 to the minus 60 molar. So this wasn't a treatment that we were going to take into the ambulance given that avagadra's constant is 10 to the 23. So there isn't any glutamate in either of these. There might be something wrong with this kind of research and that might also be something that systematic review and meta-analysis could tell you about. So we looked at all interventions tested and experimental stroke, over a thousand compounds tested in the lab, over 600 tested in those animal models that we were interested in, remarkably almost 400 improving outcome in those animal models, of which 97 had gone on to be tested in human clinical trial, of which one, that clot busting treatment with TPA was the only one that worked. And it had been tested in human stroke, not because it works in animal models of stroke, although it doesn't. Emily Sen has shown in meta-analysis that it does because it works in a very similar human condition where the blood supply to a critical organ, in that case the heart, was included. And we had treatment, stroke unit care, decompressive hemicraniaectomy and aspirin, which hadn't been tested through animal studies at all. So starting through meta-research, I suppose, although that's not what we called it at the time, to understand that there might be something wrong with drug development. And here's some work we did a few years later looking at the study supporting the effectiveness of a drug called NXY059 that had been taken to clinical trial in stroke by AstraZeneca. And that clinical trial, unfortunately, was neutral. The drug didn't do anything. And when with the benefit of hindsight, we went back to look at the animal data supporting the trial. Overall, the treatment's pretty effective. These are the 95% confidence limits of the effect in these gray bars. But when we look at those studies that reported measures to reduce the risk of bias, like randomization or blinding the assessment of outcome, they gave substantially and significantly lower estimates of drug efficacy in these animal models. And importantly, I think, not one of these studies did all three of these things. And the studies used by the company to persuade doctors to put patients into trial with the most positive findings actually had done none of these things. So at this stage in my career, I was relatively junior, and I was told that everyone knew that stroke researchers were stupid and didn't know how to do things properly. We should look more generally at high quality research. We're having some sessions, I think, on measuring research outputs. But here's the British example, the research assessment exercise from 2008, which is astonished really by how good everything is in the United Kingdom, which is often the UK way. So we identified the five institutions which were at the top of their game for in vivo research in that period, and then drew out about a thousand publications in the subsequent two years involving animal research. And then we asked the question, how good are these studies at reporting randomization, blinding, inclusions and sample size calculations? They're color coded to protect my future career prospects such as they are, lovely blue Scottish color here. But randomization in only about 14%, blinding in less than 20%, reporting animals excluded from analysis in only 10%, and only one in 50 studies reporting a power calculation. And 68% of those 1,173 publications from leading institutions did not one of those things, and only one paper out of over a thousand did all of those things. So it's not just stroke research that's wonky. Everything is a bit wonky. So the purposes of evidence synthesis, I think, looking at things, how we saw them in 2004 was firstly, could we use them to choose drugs for clinical trials? So here's a systematic review of hypothermia and animal models of stroke with Bart van der Voort, and that led to the design of our clinical trial of hypothermia in acute ischemic stroke, the Eurohype 1 trial recently completed, unfortunately didn't recruit enough patients to say things one way or the other. But this parallel track of using systematic review to inform research improvements. So here's our original study on nicotinamide showing study design bias and publication bias leading to good laboratory practice guidelines and stroke in 2009, which have been reasonably influential in moving the field forward. And fast forward to now, what sort of things can we do with evidence synthesis? And you're going to hear a lot more about these from the other speakers. But here's how we choose drugs for clinical trial. Now, this is MND Smart, a multi-arm adaptive randomization trial, where the choice of the drugs is informed by a living systematic review, which is organized and conducted by Dr. Kara Swong, a physician doing a PhD with us. And as one drug shows that it meets the futility criteria of the trial, it's replaced immediately by the most promising next drug. And work that we're doing with Xiangying Wang, just now a PhD student, student to inform research improvement, looking at things like the Landis reporting guidelines, the MDAR checklist and the new arrive guidelines, automating the evaluation of risks of bias in actually 500,000 in vivo publications using natural language processing. And there are other groups doing similar things. And I think it's a very powerful way in which we can try and inform research improvement activities. And it's not just in terms of clinical medicine. There are other groups that are using the clinical evidence to inform their regulatory decision making where clinical trials would not be feasible or ethical. So environmental toxicology, the navigation group at UCSF and OHAT, food safety as tool used by EFSA, plant protection, the Sprint as a Horizon 2020 program looking at pesticides and their effects on the environment and human health. And over the years systematic reviews have evolved, I think. And I don't think anyone else in the panel is old enough to remember what it was like in the bad old days when one person was at one computer with a stack of PDFs going through them. And then we thought, well, maybe it would be good to get two people to look at things around the same computer. This is when Emily joined us. And then starting to use distributed software to be able to do it at a distance. And now with systems like our systematic review platform and others where people can collaborate over the cloud in the conduct of the systematic views, which has really enabled much larger systematic views to be done. I'm moving towards the idea that the information that we create and collect in the context of one systematic review might then go into a central data store and be used for others. And even perhaps one day, that process of automation, that process of automation might happen at the point that the research artifact is produced, that's to say a publication or bioarchive deposition rather than only when it pops up in a systematic review. And that's all I wanted to say by way of introduction. I'd be happy to take any questions now if anyone has any in the chat, but I just then need to see the chat. And Cylene says, good morning. Good morning, Cylene. Let's see if there's anything in the Q&A and there's no open questions in the Q&A. So what I would propose to do now is to go to our first proper speaker, as it were, who is Alexander Banach-Brown from the Berlin Institute for Health Quest Center in Germany. Alex, I don't know if you want to try and get your slides up just now. You should be able to share them. Alex actually came to Edinburgh to do a PhD and some of my colleagues who knew her previously were very jealous. They said, you've got an excellent PhD candidate there. And so it turned out, and she's now gone on to better things, firstly with Paul Glasio and his group at Bond in Australia and now at the Quest Center for Responsible Research at the Charity. So Alex, over to you. Thank you very much, Malcolm, for the introduction. Are you able to see the full slides? Yeah. Yes, perfect. Great. So, yes, systematic review is a useful tool in preclinical research to get an overview of the literature. And in this talk, I'm just going to describe a case study or an example of how systematic review was used to identify gaps in the literature and to how it was used to inform the design of a primary animal experiment. And I don't think it's any surprise that there is a growing amount of literature out there. But with that also comes the increasing interest in reviewing larger and more extensive research areas. So the numbers of records that are identified by systematic searches is also increasing as exponentially as the amount of literature being published. This highlights the need for automation tools and to support meta research in this area. I conducted a systematic review of animal models of depression with over 70,000 potentially relevant records being retrieved. It was not feasible to screen them manually in a timely manner. So we trained machine learning algorithm to include these articles in the review if they were relevant, which reached a high level of performance and included over 18,000 studies. We then used text mining approaches to group and visualize the articles to get an overview of this really broad body of literature, which was then developed into a web app. Using this machine-assisted approach, we conducted a systematic review to understand the efficacy of interventions targeting the gut microbiota in animal models of depression. We aimed to get an overview of the literature to identify any gaps and to understand the quality of this evidence. So the studies were identified using the text mining approach and added to the systematic review platform surf for data extraction. Using the standard systematic review process, 15 publications were included. And interestingly, we identified studies that were both using microbial interventions to rescue a depressive-like phenotype, as well as studies that were using a microbial intervention to induce a depressive-like phenotype, such as antibiotics. In this literature, the reporting of conflicts of interest and compliance with animal welfare regulations, which was was high, thankfully. About 40 to 50 percent of the studies reported randomization or blinding, but reporting of sample-sized calculation was very poor. The results of the meta-analysis were kind of as expected. The studies that used a microbial intervention to try to induce a depressive-like phenotype, the pooled effect size did show a worsening in the behavior. And the studies that used a microbial intervention to rescue a depressive-like phenotype unsurprisingly showed an improvement in behavior. What was interesting about the systematic review, though, was heterogeneity. So no two studies investigated the same intervention. So there were no two same strains of probiotics used. And there were also a very wide range of outcomes. We had hoped to pool data from other biological outcomes, such as immune inflammatory responses. But there was simply too much heterogeneity in the types of biomarkers that people were interested in. Further, it highlighted quality. Only one study reported a sample-sized calculation. And approximately 40 to 50 percent of the studies reported randomization or blinding. But no study reported all three of these measures. Further, the median group size was 10 animals per group. In clinical studies, the effect of probiotics on depression or depressive symptoms is very small. And this was worrying because animal studies with 10 animals per group are likely to be underpowered to detect an effect size this small. And further, the systematic review showed a gap in the literature. Probiotics were, yeah, there was an interest in the clinical space for, sorry, prebiotics to assist with mild depressive symptoms. But there was a lack of preclinical evidence on this intervention. So to address some of these concerns, we designed a primary animal experiment pre-registered to test the effects of prebiotics on depressive-like phenotypes. The experiment was designed, conducted, where animals were randomly allocated to groups. Group allocation was concealed throughout, and all outcomes were assessed in a blinded manner. Further, we did an a priori sample size calculation where the unit of analysis, the smallest unit, the intervention can independently be administered to, which in this case was the cage, because animals are a co-house. When they're a co-house, they share a bacterial profile, and the intervention is thought to act through the gut microbiota. So the results of the experiment showed no significant differences between vehicle and prebiotic groups. So there was no rescuing of depressive-like behavior. However, when we redid the analysis, redid the statistics, where the unit of analysis was the individual animal rather than the cage, this showed a significant effect of prebiotics on the primary outcome of interest. And often results are presented in the literature, where the statistics are conducted on the animal, and the reporting of the housing conditions is so variable, it's often not possible to understand whether the animals were, in fact, co-housed or single-housed. So just to highlight this point with a hypothetical example, if you take two groups of animals with a mean value of 10 and 15 in an outcome, if the group size is reported as n equals 10, you get a nice significant result. But if the animals are actually housed two and two, then the real n number is five. And when n is five, the confidence intervals increase to a point where your outcome is no longer significant. There is no significant difference between your two groups. So without accounting for the correct unit of analysis, this can lead to an overestimation of efficacy. So reporting of unit analysis is not only important for primary studies, but this also has implications for meta-research of animal studies and preclinical biomedicine. So just to sum up, systematic review can be used to identify gaps in the literature in preclinical studies. And it can also very nicely be used to highlight the quality of reporting in a body of evidence. It's a big area to improve reporting of primary animal experiments. So this is a really useful tool for that. And further evidence is a hypothesis generating tool, which can be used to inform the design of primary animal experiments. And I'd just like to acknowledge the funding for this work and the support of colleagues in the studies described here. I do believe we have time for some questions. Yes, thanks. Thanks very much. I have a couple of questions coming in in the chat. They can either go into the chat or the Q&A. And so first of all, from Shambhavi Chidambaram, and when you showed your difference between the group housing versus doing the pseudo-replication, were they 95% credibility intervals or 95% confidence intervals? Confidence intervals. Okay. Question from Silene. Do you think it's also possible to apply systematic review and meta-analysis to synthesize data of studies performed before animal studies? So could you use systematic review for bioinformatics or for in vitro studies or whatever? Thank you for that question. That's really important. Yeah. Systematic review in vitro literature is a growing field and I believe there are several efforts to translate the methodology of systematic review to be able to accurately synthesize in vitro studies, studies from cells, as well as bioinformatics. So yeah, that is a growing field and hopefully we'll see some more concrete methodological advice in this area in the future. Good. And a final question about the prebiotic effect, wondering whether the reason that you saw it in one with pseudo-replication but not without was that the effect was so small that the experiment was underpowered at the cage level. Is that possible? So the a priori sample size calculation, given that there was no preclinical evidence in this field before, the a priori sample size calculation was based on effect sizes seen in the probiotics. Looking at the clinical literature, we could see that there was a similar effect of probiotics, slightly lower of probiotics. It's definitely possible that it could have been underpowered, although this primary experiment was significantly higher powered than the existing literature that was out there. And I think we've got time for some more from Shona Gordon-McKeon. She asks, sorry if I missed this, are the tools of your analysis documented available anywhere, like in a Jupiter notebook or on OSF or GitHub or et cetera, et cetera, et cetera? Most of the information is available on OSF and or GitHub, along with accompanying publications. I'd be happy to send any links. And in terms of tools to support preclinical systematic reviews, there is of course the SERF platform, which we run with funding from the NC3Rs in the UK, which allows a lot of the process of systematic review to be done, and we're gradually embedding some of these automation tools into there as they become available. So that's in the chat now. So was there another, there was another thing that occurred to me, and I've forgotten what it was, which I think means that you're off the hook for the time being, Alex. So thank you very much. And now on to Manoj, who is, he's an anesthetist really, I think, well a critical care doctor, but a kind of department of anesthesiology. But he's another clinician that's come into the world of systematic review of preclinical data in an attempt to better inform the clinical research which he has been doing. And he's going to talk with us today about one particular approach that may help refine the validity of those inputs. So Manoj, over to you. Thanks Malcolm. Can you see my slides right now? Is everything? Yes, they're perfect. Fantastic. Okay, thanks for the kind introduction there. And as Malcolm mentioned, I'll be talking about one approach and that's multi-center preclinical lab studies. And I've put a bit of a provocative title here, suggesting that they may be better in some ways than single-out approaches. So in terms of what these are, I know I'm sure many of you have thought about or heard about multi-center studies, what they are when multiple independent laboratories perform the same experiment using a shared protocol. And the reason I was particularly interested in this approach, again, is that I actually have a basic science training. So my PhD is actually from a wet lab and wet lab research. But as Malcolm mentioned, I'm on clinician now and I have both a wet lab and my primary appointment is actually in clinical epidemiology. So I'm always really interested to try to see if we can apply some of the methods that have been highly successful in clinical research back into the wet lab setting. And as many of you might be aware, multi-center studies are a gold standard in clinical trials. So here's just a little schematic. You have one lab here with one protocol and this lab has now shared this protocol amongst three other labs. So now you have four labs all using a common protocol, all performing the same or very similar experiments. So why don't you want to do this? There's a few reasons. First of all, these studies design, excuse me, inherently test reproducibility and generalizability. They can be potentially more robust than single-center studies and there's a number of reasons for that. Perhaps one of those is that you have a number of different experts as opposed to one principal investigator leading a lab. You have now multiple principal investigators, multiple senior investigators sitting around a table and hopefully a prior identifying pitfalls that might occur. Finally, there is this thought that this sort of design might increase the efficiency of preclinical translation. So by that, what I mean is basically, if you had an intervention, for instance, which was highly successful in multiple single-center studies and then you brought it towards a more rigorously designed multi-laboratory study and it failed, that might give you pause and start to think about, well, do we need to refine this therapy before we continue its development pathway? Or on the other hand, if you had a intervention that was successful in single-center laboratory studies and you brought it towards a multi-central laboratory approach and it was also successful there, well, that might give you fodder to think about, okay, we can probably continue on this translational pathway going into larger animal models, for instance, or maybe even a first in human study. So why on the flip side would you not want to do this sort of study? I've talked to many people about this for the last few years and I've really heard a number of different reasons. These are just some highlights. First of all, different labs have distinct pressures, different resources available, and many labs even that study the same sort of disease state have far different behavior norms as well too. And that can make it very difficult potentially to collaborate with each other. Secondly, I think a major issue is the publisher parish culture. So when you're thinking about a multi-center study, in a clinical trial, for instance, you have often tens of authors. However, if it's a well-conductor trial published in a reputable journal, even being a middle author there can give you credit or credit is given towards you for that. However, in basic science or basic biomedical research, at least first, second, or last author usually are given the most credit. I have to say the same thing is in clinical research as well too. However, being a middle author in a long list of authors for a basic biomedical research study, it's still not as appreciated as much as being a middle author, for instance, in a large multi-center clinical trial. So this can obviously be a disincentive to potentially participate in this sort of study. There's also first mover advantage. So to try to be first past the post on things makes you resistant to collaboration potentially as well too. And then finally, one issue that my group and others around the world have encountered is that there's very few organizations that are actually funding this type of work as well too. So that can make it very difficult to conduct these studies, which can be very expensive as well too. So my group was really interested in trying to see what the landscape of these multi-center pre-clinical studies were. So these multi-laboratory studies. So we conducted a systematic review, what I'll be showing you in the next few slides is actually unpublished work. So if you happen to be an editor interested in this sort of area, feel free to reach out to me afterwards as we are looking for a venue to publish this. So for our systematic review, we registered our portfolio priority in Prospero. We conducted a systematic search of two databases, M-Base and Medline. And what we were looking for in vivo studies, so in animal, live animal studies, where they had tested an intervention, typically a therapy. And of course they had this multi-center or multi-laboratory design. And we used best practices in terms of duplicate screening, duplicate extraction. And in the end, what we found was that there were only, try to make my slide work here, 16 of these studies that have been published ever to date. However, what's interesting is that 12 of those have been published since 2015, making this, you know, or demonstrating that this probably has a bit more interest recently as well too. And in terms of where these studies were conducted, the majority of them were in the United States with a smattering of other countries involved. Diverse clinical spheres though that were being investigated. So for instance, stroke, brain injury, traumatic brain injury, myocardial infarction. And next, what we were thinking about were the outcomes of these studies. And we wanted to quantify or compare quantitatively the outcomes of these multi-center studies to previous single-center studies. So again, looking to the clinical research, trial research, literature for inspiration, we actually found these landmarks studies where they had in clinical trials compared effect sizes from single-center clinical trials to multi-center clinical trials. And what they have found, I'll just point down to the bottom here, is that single-center trials show larger treatment effects than multi-center trials. So that might seem obvious, but these were some of the first studies to actually quantify this sort of effect. In other words, these single-center clinical trials inflated effect size, and these multi-center trials actually showed smaller effect size. And the effect sizes for these multi-center trials actually were closer or parallel closer what the real-world effects of some of those treatments were as well too. So we're very interested to see, is there a similar effect in pre-clinical studies? So to do this, we paired our 16 multi-center studies with 100 single-center studies. These studies were paired along the interventions that were tested, as well they were paired along the disease states that were being investigated. You can see here a comparison with sample sizes to a much larger sample size in our multi-center versus single-center. However, the total number of animals being used was approximately the same. The publication range was, or date range, excuse me, was around the same as well too. And you can see that, not surprisingly, most of these studies were using rodent models. What was quite different though, was their methods to potentially reduce risk of bias. So methods like randomization and blinding, and you can see here again a comparison in our multi-center and or single-center studies with our multi-center studies having a much larger percentage of them adhering or basically addressing methods to reduce risk of bias. So overall, multi-center studies address these my watch is talking to me, address measures to reduce bias than single-center studies. We then quantitatively looked at this. So this is a forest plot comparing the difference in standardized mean difference between these pairs that I've shown you in the previous slide. So each row represents one multi-center study paired with a number of single-center studies that we could find. You can see overall, and I'll point your eye down to the bottom here with our pooled analysis here, that there was in fact a larger effect size seen in our single-center studies compared to our multi-center studies. So in other words, paralleling exactly what had previously been shown in the clinical trial research literature. So in other words, there was a large effect size in the single-center studies and multi-center studies, and there were notable differences in sample sizes and methodological rigor. I just have two more slides, Malcolm. So I'll just take another minute here. There, we then moved on to a separate study. This was a qualitative study where we conducted semi-structured interviews with the investigators who had actually led or had been on the ground conducting these multi-center preclinical studies. And I'll just give you one slide here of highlights of what they told us. So in terms of barriers, they said, as I mentioned previously, that funding was a big barrier. These studies were very expensive. The culture and climate of the science community made these studies difficult as well, too, as this approach wasn't valued. More practically, protocol harmonization between the centers was difficult as well, too, in some cases. And differences in lab resources that are available and differences in the local animal ethics committees also made it very difficult to conduct these studies in some ways. However, on the flip side, there was lots of facilitators, first of all, open-minded and trust between investigators, having regular meetings and very regular engagement between the centers. And then finally, again, this is actually something we do in clinical trials, site visits and on the ground training as well, was very valuable for those studies that were successful. Last slide here. I just wanted to show that this wasn't meta-research for the sake of meta-research. We've actually used some of the learning from the systematic review that I've shown you as well as the interview studies to set up our own multi-center preclinical collaborative. This is funded by Cephas Canada as well as another federal funder. So we're lucky to get some funding from a couple of sources. And we're actually moving towards multi-center studies of preclinical Cephas models. So that's it. These are my collaborators for different aspects of this project. And I should mention Victoria Hunterford was a student and research assistant who helped lead some of that systematic review work that I had shown. Thank you. Thanks, Manoj. There's a couple of questions coming in. And I'm going to bring the first two together into one because I think it'd be easier. So Michael Andrade asks if there's a known source of funding committed to multi-lab research initiative? Has any country or institution been leading that path? And Jeff Mogo, a pain researcher, says it seems to be that incentives such as these will never be permissive for, you talked about incentives a lot earlier, but not sure that the incentive system should change. Do you think that the solution might be to have funding agencies solicit groups to perform multi-center replication studies around identified important findings using a contract system rather than a granting mechanism? Yeah. So I think the first is, you know, the ones I know about Malcolm Schreen and what others, I mean, BMBF, for instance, in Germany has funded a number of multi-center preclinical studies a couple of years ago now. I think the timing wasn't the best with the pandemic, but so I think we'll be hearing about that in the next few years as they've funded a number of those studies. And then Jeffery's question, I think is a good one. So Jeffery's Canadian as well too. So I went to CHR several times with, you know, with very good questions and I was unable to get funding over several cycles. And then we found a different mechanism who accepts this Canada, which is also CHR funded, our federal funder. And then our, we presented as a new idea for the new frontiers and research fund as well too, which Jeffery will be familiar with, which funds high risk high reward research. So we have been successful, but that being said, I think your idea is great. I think that's exactly what should be happening. Once we have single laboratory studies, as you've described in your previous editorials and nature, I think with Malcolm, you know, talking about confirmatory versus exploratory research, you know, once we have a number of different exploratory studies demonstrating some potential, I think this is a mechanism that should be looked at to think about, again, the development pathway before we move into, you know, first in human or even larger animal studies. And I think also NIH had a similar NINDS, had a similar program for stroke research, where they envisaged the creation of a network of preclinical researchers who would come together, but quite how that would work. I'm not aware of. And the spinal cord community did something similar many, many years ago, Aussie Stuart and others. Although again, not quite the multi center study, each addressing the identical question as you've as you've suggested. So a comment from Wolf in the chat and the BMBF project is still ongoing in Germany. Most projects have been funded for three years. There's a question from Olavo Amaral, which I'm going to segue with a question from Sarine. So Olavo says, have you tried controlling for the method logical confounders in this study in this and the differences in those between the single center studies and the multi center studies to see whether that's the different that that drives the difference in effects crisis single center known randomized. And so I've been wonders whether the smaller sample size in the single center studies means that you're getting inflation of effect sizes because you're getting reporting of some of the large effects that are reported sort of kind of publication by something. And then Dan for now is going to a related comment, which I'll come to after you've had a shot at those. Yeah, so I think Selena answer that question first. Yeah, undoubtedly that plays a role in that it's exactly what we've seen in clinical trial literature that's been looked at very carefully as well too. So I have no doubt that, you know, the inflation of effects is related to that that goes to allow those comments as well too. In terms of the, you know, we should probably do meta regression around that the different aspects. That's a good point. And maybe we could start to pick out what factors are actually driving some of those differences. That being said, you know, even if there was no differences and I was prepared for that obviously we didn't know what we're going to find in the end. I think there's still as inherent value in testing the same protocol in multiple centers, and that inherent value comes from again the generalizability and external validity that you're testing or assessing with this sort of approach, which, you know, there might be other ways to do it within a lab, but putting in different environments, different microbiomes, different equipment, et cetera, really speaks to the generalizability of your findings. And then finally, Dan finale is good. It's an observation come question, but wondering how surprised you were and hopefully you weren't surprised because one should never ask a research question that one doesn't at least have some sense of knowing that you're going to get something interesting at the end of it at this difference in single center and multi center studies. But but if you consider that the single center studies might be considered epistemologically as exploratory and hypothesis generating, even though that might not be how their findings were presented in the ensuing publications, and the multi center studies are confirmatory, then the former being biased in favor of signal detection, the latter in favor of generalizability is do you think that's a that's a thing and does that make more of a case for multi center studies? Yeah, so, you know, I wasn't overly surprised, you know, looking back at your work Malcolm and others where they've clearly demonstrated the risk of bias is associated with or a lack of addressing those issues is associated larger effect sizes. We weren't overly surprised with the results in the end, because as I've demonstrated those multi center studies seem to adhere to those methods to reduce bias, right? So I think that's one piece there. And in terms of, you know, I guess, you know, comparing those sort of two study types, you know, I think the, there's still a role for this multi center approach. And one thing I didn't actually very safe, save very explicitly there was that most of these studies actually did not demonstrate any effects. So I showed the difference in effect sizes there, but most of them actually showed no effect for the therapy that they're looking at, right? So again, that's one piece I should make a bit more explicit there. So again, I believe that shows some value in at least thinking about maybe not ditching that intervention, but starting to think about, okay, how can I refine this so that in a rigorously conducted study, we can still see effect sizes there as well too. So I think that speaks to where they should be done. It goes back to Jeffrey's comment there at the very beginning as well to where, you know, I 100% degree, we still need exploratory studies. You need room as a basic scientist to tinker. I've been allowed myself, I know that, you know, things aren't a very straight line from A to B, right? You often take a very circuitous pathway to get to some exciting findings, but then to start to move that into human support to larger animals, I think that's where we need to start thinking about more rigorous approaches before we do that. And can I take Chairman's prerogative and ask a final question, which is when you're powering those multi-centre studies, because Alex made a point earlier on in her talk that she powered hers on the base of what probiotics did. And there's this question about whether you should be powering on what you might reasonably expect to find or on what you would consider to be a minimum effect size of interest. And if you're powering multi-centre animal study on a minimum effect size of interest, what is the effect size that makes you interested enough to think about doing a clinical trial? I have no answer for that. Do you have an answer for that? I think that's a good question. I think that there's a nice study in the making there. So looking at, you know, I think we could look at quantitatively comparing some of these studies that led to successful clinical trials and seeing what those differences actually were, right? So there might be something there in the future to look at. It's a good idea, but unless you have an answer, I don't actually have a firm answer for that, though. I'm still at the stage of life where I've got answers. I've got questions, but not so many answers. So thanks, Manoj. Next, we've got to Koji. Now, when I first met Koji, he was a PhD student with Shanichi Nagakawa at the University of New South Wales, and he comes from a very different background in ecology and evolution. And what he's done with Shanichi and others is to apply their approach to some of the literature from systematic reviews of preclinical studies. So to Koji, the floor is yours. Thanks, Malcolm. Yeah, I'm still a PhD student. And yeah, I'm now at the Biodiversity Research Centre at the University of British Columbia. And yeah, as Malcolm said, I'm going to be talking to you today about a project I did in Australia on how we can embrace heterogeneity to improve replicability. So yeah, just before I dive in, just a quick acknowledgement to my co-authors for this project. This project was made possible through a collaboration of an interdisciplinary team of researchers, as Malcolm mentioned, those that specialize in evolutionary biology, such as myself, metanalyses, as well as experts in preclinical science and biomedicine over at the camera others team. So we've been talking a little bit about replicability and generalized ability already. And there are a number of things that we need to obviously consider when designing such experiments in biomedicine. So traditionally, we've tended to focus our efforts on our ability to detect treatment effects, in other words, ensuring high internal validity, such that we have high confidence that any change that we detect is due to our treatment of interest. And along with that, obviously, we've tended to want to minimize any confounding variables by reducing within study variability, essentially treating this statistical noise. And of course, when designing any experiment, we'll have to also consider any ethical or financial concerns, for example, by reducing or trying to reduce the number of animals used. And so to meet these goals, we've traditionally followed the document of standardization, which of course is to minimize variability within studies as a means of a reduction in the number of animals used to obtain information of a given amount and precision. But my question is, does standardization actually lead to replicable and generalizable experiments? Well, some theory has shown that perhaps not. And the reason for this is that whilst standardization increases the internal validity and the power to detect treatment effects within studies, it also reduces the external validity or the consistency of results across studies. And to explain this very briefly, we can make use of a basic biological concept of reaction norms and testicity. So that studies and labs, even if they're highly standardized and controlled, often vary in environmental context across a whole suit of different unmeasured variables, whether that be husbandry procedures or feeding times or lighting or what have you, meaning that different studies can fall on different parts of this reaction norm and we end up with different phenotypic outcomes. And then what standardization does on top of that is by reducing the within study variation, it narrows this confidence interval under each study, such that outcomes become more distinct and idiosyncratic between studies. In other words, results becoming less replicable across studies. This has been dubbed the standardization fallacy by some. So an alternative approach then is to instead of minimizing variability, to embrace within study variability, this is known as heterogenization. And we've already talked a little bit in a way with Manage on how we can account for some differences within studies by conducting multi lab trials. Another way which my talk focuses on is to in purpose and in a systematic manner introduce variability within a single study. So how do we actually do that? How do we embrace and identify ways to embrace variability as an outcome of interest in its own right to improve replicability? So in our paper, we identify and argue that there are at least two ways in which we can embrace variability in preclinical studies. And we illustrate these two approaches through a meta analysis of variability in what animals was of stroke and this data set was obtained from the camera, this database. So first we'll demonstrate how we can through meta analysis quantify and embrace variability generated by different experimental procedures. And second, we'll demonstrate how we can through meta analysis quantify and embrace variability generated by different drug treatment interventions. So embracing variability in experimental procedures, what we wanted to do was to assess how much variability could be generated across a whole suit of different experimental procedures, anything from say the genetic aspects of the study, such as in the sex of the animals used to, for example, different occlusion methods for inducing different strokes. And so we did this by meta analyzing the inter individual variability as measured by the coefficient of variation and in particular their variability in infoc volume and the amount of brain damage. So through this, we can begin to identify procedures that generate different amounts of variability in our baseline disease states from those that generate high variability versus procedures that are more consistent in their outcomes. And so for example, here, amongst the different occlusion methods to induce stroke and animal models, those that use spontaneous occlusion procedures generated the greatest amount of variability, for example, an infoc volume here. So whereas under the standardization framework, we would have maybe recommended the use of, say, a filamental approach to methods of stroke induction, which generate relatively more consistent baseline disease states, we argue instead that we should be quantifying variability to identify methods that generate the most variability in disease states and argue that by using such methods where possible, obviously it depends sometimes on the questions that you're asking in the experiment. We can create a more diverse and representative distribution of baseline disease states against then which treatment efficacy can be assessed. So then moving on to quantifying variability in our drug treatment interventions. So again, we can take a meta-analytic approach this time quantifying both the mean effect of treatments, so the efficacy of treatment outcomes, as well as the variability or stability of drug interventions. And by doing this, we can start to identify different groups of treatments based on where they fall onto this 2D quadrat. So everything to the left of this vertical line are treatments that have on average beneficial effect. But then over on the top left here, we have treatments that also increase the variability in outcomes, such that the results are inconsistent across individuals. Whilst on the bottom left here, we have drugs that are on average beneficial, but also beneficial across individuals consistently. So in doing this exercise, we can begin to identify drug treatments with a wide range of efficacy and stability. And importantly here, we highlight in green the treatments that have on average significant efficacy, whilst not significantly increasing the variability. And so we argue for that for these treatments due to the lower variability observed among individuals, the average effectiveness may be more generalizable to the population level. So we also here in blue and pink identify some treatments that are significantly effective on average, but also significantly increase the among individual variability. And for these treatments, we argue that due to the higher among individual variability, translation or application of these treatments to a clinical setting may require slightly more nuance and maybe more individual specific. And for those that particularly care specifically about stroke interventions, I've highlighted here this thrombolytics group and the hypothermia. Hypothermia, interestingly, had the most greatest efficacy in their meta-analysis, but also the greatest inter-individual variability as well. So taken together, we argue maybe that the current failures in replicability may be at least in part due to the way these are designed and assessed, which is to minimize or ignore variability. So instead, we first recommend that we should be creating a more heterogeneous, more broadly representative backdrop of disease states in order to avoid context-dependent outcomes. And then secondly, we also advocate to embrace and assess variability in our treatment interventions in order to maybe identify potentially generalizable interventions. And then just a quick final slide to say that so we try to quantify this relationship between the amount of variability induced by methodology versus the consistency in the outcomes we observed in our treatment effects. But so we were a little bit limited in our project by the data structure and the number of data points available, but we would love to, in our future work, try to do a more formal and rigorous meta-analysis or second-order meta-analysis, linking these two variables together. And I'm happy to chat about that with you guys more. Yeah, thank you for listening and you can find your paper here. Thanks. Thanks, Takuji. Sorry about promoting you to post-doc. It's always very embarrassing when you judge people by the quality of their work and the turnout to be much more renewed. I also very much like the salt tire hanging on the wall behind you in the lion rampant. We have got our first question in if you were able to take questions. Someone is asking whether how is it that the use of male and females together leads to lower variability than rather than higher variability and how do you interpret that? Yeah, that's a really good question that we also struggled to interpret. We tried to kind of assess this in multiple ways, but we don't have a good answer for that one. Other than to say that there are obviously when conducting meta-analysis, there are many confounding variables also in doing meta-analysis and perhaps that there are some things that we haven't quite accounted for that could explain that difference. Yeah, I know that's not a satisfying answer, but we were also not satisfied with that answer as well. A second question from Jeff Mogul. His expectation, don't more effective treatments have to have higher variability as almost like a statistical fact. The bigger they are, there's more room for a difference between zero and the effect and therefore they tend to have more variability because presumably the less effective treatments with higher variability never get published because they never get out of the ground. Yeah, that's a great point. I would agree. In our analysis, we do try to account for publication bias about more effective treatments being published and as well in terms of the effect size that we used in LNCBR, we should account for mean differences and the effect that the mean differences actually have on the amount of variability. If that kind of answers your question. Okay, and then Shambhavi again raises the issue of the difference between those things that vary between labs that are within the gift of the investigator to control and those things which they may be completely unaware of, latent differences like the lux in the animal house or the heating or the noise. How do you account for those factors which might be driving variability of which you've got no knowledge even that there's Donald Rumsfeld. It's the unknown unknowns. Yeah, again, yeah, that's a great point. I think maybe the point is that we can't account for these kind of things and so yes, the point is to try and so several things. The first point being by as Manoj mentioned earlier doing multiple lab trials, for example, we can try to account for some of these within different, within variability, within the different labs itself that we can't really account for. Yes, and the other thing is to try and actually systematically induce this in our experiments. Further comments which are related, which I'm going to comment on beforehand over to you. Dan, for now, this isn't because mixed sex is lower standardization, thus reduced variability between labs. Romain Daniel-Gottelin isn't heterogeneous in models, a good thing because our patients are heterogeneous. And my sort of comment added is that in equipped, we've taken three different paradigms, open field, urban safety testing and some EEG reporting and deliberately introduced heterogeneity within labs and found then that the results between labs are more similar than if you don't have that heterogeneity within labs. So is there a trick here about building heterogeneity deliberately into our experiments and see that if a treatment survives that it's more likely to be generalizable. Does that make sense from an ecology evolution point of view or not? Yeah, I think it does make sense, yeah. Yeah, I don't really have a good explanation for any addition to that, yeah. Yeah, so the remains went back when I was trying to think about drugs taken to clinical trial. I'd want to see that it works in a whole range of places in the hope that it might work in a whole range of patients. The problem being that some human clinical trials have inclusion criteria which are so narrow that the results aren't generalizable from the trial population to the human population. So maybe we need to... Where I thank you very much, we're out of time for you, our final speaker, and we're going to have 15 minutes of time for chat at the end, is Tracy Weissgerber. Now Tracy is by background a vascular physiologist and preeclampsia researcher who's become a meta researcher with a particular interest in transparency and reporting and also meta research around data visualization. And if you haven't seen her materials to teach about, and it's on the open science framework as well as described in a couple of papers, to teach and train people to have appropriate data visualization well worth the visit. But she's going to talk to us today about her complex claims supported by multiple lines of evidence obtained using very different methodologies influence the direction of basic biomedical research. So Tracy. Hey, thank you for that very kind introduction Malcolm and also to Minouche for setting up and organizing this session which I've really been enjoying listening to. This session is going to be different a little bit from some of the others because while others have data and studies and results to show you I have a problem and a solution that I'm proposing that I'm not sure if it will work yet because I haven't tried it. And therefore I'm very interested in getting your input on it before I start trying it and find out that it doesn't work. So the question I'd like to raise today is how can we evaluate complex claims? So many of you who are attending this session will already know that systematic reviews and meta analyses are a very valuable tool for evaluating the quality of evidence as well as the quantity of evidence for both clinical and preclinical research claims. And certainly we've heard confirmation of that from a number of our speakers today and some innovative ideas about how those results can be used. However, one of the limitations of the current systematic review and meta analysis methods that we have is that they're really designed for simple claims that are supported by a single line of evidence. So they require us to define our research question and the types of studies that we're interested in according to the population and problem, the intervention or exposure, the comparison, the outcomes, and the study design. And this often means that we need to have a single clear research question in mind. And that research question needs to be supported by a single line of evidence or a very consistent study type that we can assess to determine whether the claims are founded. Unfortunately, there's a lot of times when that's just not actually what happens and not what we're faced with. So what do we do if a claim is supported by multiple lines of evidence? So maybe there are some human studies, some animal work, possibly some in vitro work as well, and maybe multiple types of evidence within each of those different study types. And again, those lines of evidence come from very different methodologies, maybe done in very different systems. How do we go about evaluating these types of claims? So let's take a look at what a complex claim might look like. And the reality is that complex claims are very common in many different fields. One of the examples of types of complex claims that are particularly relevant to this section is complex claims surrounding the path of physiology of disease and the mechanisms of disease. So why do we think a disease occurs? These types of claims are often based on the accumulation of evidence that's built up over decades from both human and animal studies, as well as potentially in vitro work. Why does this matter? Well, I'll start off with an example from my previous field, preeclampsia. And one of the claims that you will very commonly find in the introduction of many papers is that preeclampsia is caused by shallow trophoblast invasion of the uterine spiral arteries. And this is a claim that's so widely accepted that it may not be cited, it may not have a citation at all. And if it does have a citation, it may simply be a review article. It is simply something that's a baseline claim that everyone accepts as being widely true. I attended a workshop sponsored by the NIH a couple of years ago for preeclampsia researchers. And one of the things that came up was that there are multiple lines of evidence for this claim. But among people who had gone back particularly and looked at the human studies, the evidence for this claim is actually shockingly weak. So there are a very small number of studies based on a small number of patients, their biopsy studies with uncertain tissue types, and the diagnostic criteria are also be uncertain. They are also older studies, hence the diagnostic criterion may not match what we would use today. And so the consensus in the room was very clearly that additional research is needed on this topic using modern imaging methodologies that we have valuable today and would be very insightful. However, everyone in the room also felt that no granting agency would fund that type of study because it's a claim that's so widely accepted as fact that if you come in and say I want to reevaluate this using modern methods, they would tell you it's a waste of funding. And this isn't an idle problem because these types of complex claims influence the types of studies that we perform. So they influence the direction of research for human studies, investigating pathophysiological mechanism. They influence what animal models we developed and we select as being relevant to disease states. And they also influence what therapeutic agents we select for development and testing. So poor information or proliferation of these claims that may not be well supported has an impact on the direction of entire field and could potentially be sending us along the wrong pathway for years to come. So what happens when we look back at the evidence for complex claims? Well, we have a couple of different problems. And the first that we may have difficulty identifying all of the lines of evidence that are relevant to us. So again, if things are being cited through reviews and the literature supporting the claim is older, then it can be difficult to find out exactly what all of the pieces of evidence that led to this particular claim being accepted are. And we also have this issue of the evidence building over years and decades, which I've mentioned previously. And this is a problem because evidence that was convincing at the time it was published 10 or 20 or 30 years ago might be judged very differently by today's standards. And one of the strengths of systematic review and meta analysis methodologies and the reasons that they're so powerful is because they allow us to evaluate all of the evidence systematically using a very clearly defined set of standards. So we review everything through the lens of what we know is important today. And this is a really critical tool that it would be nice to be able to apply to these complex claims. So how can we go about applying systematic review methodology to evaluate these complex claims? I'm going to propose that we might start with a process that involves five steps. The first step is we need to select a claim to evaluate, identify lines of evidence and prioritize and select which lines of evidence we want to include and how important we think each line of evidence might be to supporting the claim. We might then conduct a systematic review of the evidence for each line of evidence supporting the complex claim. And finally, we would pool all of that information together according to our early prioritization in order to conduct a pooled assessment of the evidence. So if we break this down further, what might that look like? Okay, so the first thing we need to do is select a complex claim to evaluate. We need a topic to study. And I propose that that topic should meet two criteria. The first is that knowing whether the claim is supported would have major implications for the field. So if we find that the claim is not supported, it would have a major impact on the direction of research, on the animal models that we're using, the treatments that we are thinking about, the types of human pathophysiological studies that we are conducting. And this impact measure is important to make sure that it's worth the effort that this type of an intensive review process is going to take. And then the second criteria is that a traditional systematic review isn't possible, often because there will be multiple lines of supporting evidence and that evidence may come from studies with very, very different methodologies or based in very different biological systems. Our second step is to identify the lines of evidence. And here I think it's important to plan and pre-register a protocol that specifies how we're going to go about identifying the lines of evidence for these three strategies. And there are three things that I think we should consider. The first thing is examining the citations that are used to support the claim. And oftentimes this is going to involve a lot of backtracking through chains of citations to get us back to the original research that was done to support the claim. Particularly if the claim is well enough accepted that it's no longer being cited, or that it's being cited to reviews, which are then citing other reviews. The second thing we might go to is expert interviews. What do people in the field think underlies this claim? What studies can they point us to? What lines of evidence do they know of? And then there's also investigator knowledge, which is simply another form of expert advice. And we need a pre-registered protocol that specifies how each strategy might be implemented. Our third step is to select which criteria or lines of evidence are relevant, and then amongst those selected prioritize which lines of evidence should or are most relevant to supporting the claim and may provide the strongest evidence for that claim, whereas which lines of evidence provide a weaker level of evidence for the claim or more indirect. And again, I think having a pre-registered protocol here is important, as well as once the ranking is done, very clearly registering what the ranking was, because that will become important in the evaluation of pooled evidence step at the end. So in terms of defining selection criteria, one approach that we could use to evaluate each line of evidence according to Hill's criteria for causation, and then also some thinking about how important each line of evidence is to the claim being evaluated. Is it direct or evidence? Is it indirect evidence? How tangential or directly relevant is it to the claim or the human condition that we're looking at? And then we would need to have a ranking of each line of evidence that was included based on the criteria described above. The fourth step is perhaps the most straightforward, perhaps not, and that is simply to complete a systematic review for each included line of evidence, which would involve preparing and pre-registering protocols and then completing the reviews and meta-analyses, if that's appropriate. And then the last step is a pooled assessment of the evidence. And so the final assessment needs to be based on the strengths of each line of evidence, as well as the relative importance that we're assigned to each line of evidence in stage three. And this assessment as well should be conducted in accordance with a predefined protocol to prevent our knowledge of the evidence and what we might like the answer to be from interfering with our weighting of the evidence once we have full details and a full assessment. When we're reaching conclusions from this process, we're going to want to address the following points. The first is the strength of evidence for the complex claim. And the second and perhaps more important is an evaluation of the strengths and limitations of the body of evidence. So what types of additional studies do we need, if any, and that should be assessed both with regards to existing lines of evidence and potentially new lines of evidence. So are there newer methodologies that would be beneficial to fill some of the gaps that were identified in the more historical evidence that we found. And then finally, what are the implications for the field of the previous two sets of findings. This is simply a summary table of everything that I've gone through before outlining the five steps and some of the key things for each stage. And that is the end of my talk. So I'm happy to answer questions if there are questions and if I have answers. Fantastic, fantastic, and really interesting. There are no questions. Oh, my chance just disappeared. Let me see if I can get a chat back up again. So there are other domains in which this is... So there's two approaches for this. One is, if you like, is cool validation, where there's a claim or a belief system and we're saying what's the evidence for it. And the other you might think of is push validation, which is to say, can we get a claim from this literature? Is there a knowledge claim we can make? There are solider fields like the navigation guide that Tracy Woodruff done on New Hat approach have got a similar challenge in that for a regulatory risk assessment for environmental chemicals, they're looking at mechanistic data and data from experimental exposures and animals and data from human epi. And the way that they combine it is by saying, if they take a grade type approach, they say this evidence, you know, high, medium, low, and then what happens when you put it all together. And then the other approach I suppose, and Manors might know a bit about this, is could you do a network meta analysis of the different claims to get a quantitative response? I don't know though, Manors, what do you think? I don't know if Tracy had any thoughts first? Yeah, I think, so part of the issue for me that I think is one of the larger ones is just identifying the claims and the evidence to begin with. I think the challenge of that process cannot be understated, especially for things that are more historically based. And I'm also, you know, I've also been thinking about would it be possible to develop software tools to kind of go into what's being cited and trace things backwards as well. So I think there's really the two pieces there. It's the, how do you identify and prioritize the evidence? And then what do you do with it once you have it? And I agree with you that there's a lot more work done probably on the, what do you do with it once you have it side? And there are more options that one could consider. Yeah, yeah, but there's a lot of stuff in the chat. So let me come to, so Dan Finnelli mentioned that he proposed a metric in 2019 might help to combine multiple sources. I think was this the K index, Dan? And he's put a link in the chat where he describes it and he's working on a method to quantify all terms exactly and to combine them qualitative, to combine qualitative and quantitative knowledge. Jeffrey Mogle's got a couple of open questions. Very interesting idea. It strikes me that it's all going to be a bit dependent on your weightings of the different evidence streams. You know, do you prioritize in vitro findings over in vivo findings or whatever? But don't you think your preeclampsia is, examples are rare one and the much more frequent situation is that something's already known. Brain area X is involved in trait. He hasn't put known in inverted commas, but I will. And people waste money showing it again and again, again, with increasingly sophisticated modern techniques. And then he makes a little comment on your phrase complex claims. And he argues that it's not the claim that's complex, it's the evidence which supports the claim. The claim is simple, but some claims have got simple evidence behind them and some have got complexities. But I don't know if you've got a comment on those. Yeah, so I agree with the SYNC active definition. It's more about the complexity of the evidence supporting as opposed to the complexity of the claim itself. So that could certainly be clarified. Regarding the question of whether the problem is more that people are reproving the same thing over and over again with newer technologies or it's more a problem of claims not being evaluated. In my field, I certainly think that it's more an issue of claims not being evaluated. I cannot speak for all fields or for other fields, but I think the problem that you've raised, you can see. You can see if studies are consistently coming out with newer methodologies proving the same claim, but if something is so widely accepted that it's not being evaluated anymore and the original evidence isn't being cited anymore, then it essentially becomes an invisible problem. And so I could see those types of things in your field not raising to the same level of consciousness because you're not being reminded. They're not visible. They're essentially just in my field, there are multiple things like this, and they're all just lines that you read in the introduction of most papers. And most people just read over them, read past them. And we've had a couple of examples of people starting to question some of those claims and really leading to very transformative ways of thinking for the field. And I think we need more like that. And that means we need more critical evaluation of these things that we really aren't even looking at the evidence anymore because they're just so widely accepted and the evidence is so old that it's not even cited. Yeah, and that touches on something I heard from my colleague earlier this week about a group that were involved in seeking to replicate a claim using a different methodology. And the way that they knew their experiment work was if it confirmed their claim and if it didn't confirm their claim, well, clearly there's something wrong with the methodology. And so they would exclude those data points or exclude those experiments or keep doing it again until they got the right answer. That's great, Tracy. We've got about 10 minutes left. So what I'd like to do just now is open to general questions. And the first general question is for you, Tracy. Wouldn't a systematic ontology for describing claims in a standardized language in particular fields be a necessary step or be a very facilitating step for the sort of thing that you're proposing? I think that's something that could be quite interesting and I would be interested in discussing and learning more about that. Okay, yeah. Shambhavi again. So when citing something in a field that's been historically accepted as true for decades, would it make more sense to cite the original that sort of the evidence? Yeah, well, that's, I can take that. There's a lovely book that Cassidy Sugimoto, who's one of the core hosts of this, has written with a colleague who I'm afraid I've forgotten, that goes into all of these citation practices. And there is this problem that the original citation gets lost in subsequent citations. And so the originator of the idea doesn't get the credit. Okay, lots of people happy, particularly with your talk, Tracy. I'm sorry, I'm not jealous. But I thought all the talks were excellent. Do any of the other panelists have any questions for each other or any comments on what they've heard from the other talks? Alex, you've got your hand up. Yeah, thank you, Tracy, for your talk. Really, really interesting concept. And I guess I can just speak to some of the very exploratory work that we're doing with the European Space Agency, which will be presented at the International Astronautical Conference next week. Dr. Mona Nasser has been working really extensively to help prioritize different streams of evidence and to help answer questions that are relevant for astronaut health, where often actually the outcomes that we're interested in seeing, so we've got clinical data that has relevant health outcomes, but the way that the study has been conducted is not direct enough. So then we have to go back and look at some of the in vitro studies. So yeah, it's about prioritizing kind of the strength of the evidence and prioritizing the outcomes. But yeah, we're very, very exploratory. And but hopefully I'll be able to forward her talk to you. And if someone asked the title of the book, I've actually got the book. That's how keen I am on the book. It's within an arms length away. And now let's see if there are any questions. Karis Wong, who I talked about in my talk, thanks for your talk. How can we improve and evaluate the methods that we use for prioritizing lines of evidence for questions where we don't know what the ground truth is? And I suspect that that would include, for instance, areas where we don't know, you know, in areas where we're still trying to develop treatments, we don't know what an effective treatment looks like. So we can't focus our approach to delivering that answer. Because if we knew the answer, we wouldn't have to do the research if you. Yeah. And I think, unfortunately, the unsatisfying answer to a lot of these questions is we just need to start trying stuff and see what works. I think there's, you know, this is the limitation of a presentation without data is that there's no, you know, once you have examples to work with, you start seeing where the problems are and where the strengths are and what additional considerations come up. And I've certainly, we've seen this a lot in the automated screening working group that I've been working with for a while. So I think it would be nice to have something similar here in the way that we want to test and start examining this approach would ideally be to do a couple of different claims in different fields to get a sense of what obstacles we run into, what challenges we haven't anticipated, whether there are steps or stages that we've missed. And it's something that I feel that this type of problem that you just have to learn by doing. And unfortunately, you can only think your way through the experiment so far. And for the, for any youngsters listening in, there's nothing wrong with learning by doing. And there's nothing wrong with learning by doing when you've got a protocol. You just have to say when you get to the end of reporting what you did, we did it slightly differently from how we were going to do it. And this is why, and this is what we've found. The best advice I ever had about research when I was explaining to my boss at the time, Jeff Dunn in Australia, how I couldn't possibly try and do meta research in systematic review because my first one would be lousy. He said, well, your second one won't be any better until you've done your first one. Best bit of advice I ever had. Manors, you've got your hand up. Yeah, it's following up on that Tracy's project there. I think I'd be interested in talking to you later, Tracy, because we're doing something somewhat similar, but in an area very specific to my clinical expertise, which is anesthesiology. So I was looking at heart attacks or mercury infarctions that occur periodically. And there's this thought that they actually occur in different ways than a typical heart attack that a person has off when they're doing gardening or whatever. So it's really similar parallel issues that we're running into. We're actually using Hill's criteria as well too, which is interesting when I saw that on your slide. So I think if you are interested in looking at other disease states or other complex claims, I could send you the protocol for what we're trying to do. The issue right now is that we did our search to try to find and there was 25,000 citations, which my students are going through right now. And unfortunately, there's no real great way to go through them that's automated using available tools because a lot of the data is qualitative. So you're searching for lines within a larger paper often. So it's very complicated and I'd be interested in hearing more thoughts about that offline as well. There's about a question for you, Manoj. But with my sister who's an education list, we've done a systematic review of master's education provision in Europe or something and used the citation screening algorithm and just fed the decisions into it and it worked. It's magic. There's the James Thomas magic box. It's fantastic. But the question for you is, from Mark Avie, could the experience that you've got in building up the sepsis project in Canada be applied to other disease models either in Canada or elsewhere? I think he's because Mark's based in Canada as well, isn't he? So I think he's wondering whether you could share. No, absolutely. And we actually it's a great question. We actually just got two meeting grants right before the pandemic from the seat like our federal funders to basically conduct a meeting to raise awareness and start discussing multi-center studies in Canada. So obviously, Canadian focus, but even especially in the current setting where everything is virtual, there'd be no problems at all even having other international people who are interested come to the meeting. So we're hoping to do that in the spring. So obviously, if you're Canadian, you know, might be able to bring you in person if in person's allowed. But, you know, certainly even international folks that are interested, please contact me. I'll put my Twitter handle in the chat box here plus my email. So feel free to contact me. We've already talked to mainly cardiovascular respiratory researchers in Canada and some international folks. But, you know, Jeffrey, if you're interested in pain, you know, certainly contact me. And Mark, I know Mark is the new director of standards for our animal welfare regulatory agency in Canada. So, you know, certainly probably some interest there as well too. So, yeah, for sure. Yes, is the bottom answer. Great. So we're coming to the end of our time. One of the things that we've not touched on and you mentioned Mark that brings it out is that, of course, when we do animal research in particular, there's an ethical cost of doing the research. And so anything that we can do that improves the quality of information that we're able to get from the research that is done improves the benefit while sustaining the harms at the same level. So improves the ethical position of animal research. And so I think that's critically important. So I'd just like to thank Manoj again for getting us all in the same virtual room, all of the speakers for what I've found to be a fascinating set of talks. We're all going to be able to head over to Remo for at least 20 minutes or half an hour. Some of us have to drop off after that. So if people want to carry on the conversation there, come across. If not, thank you all for attending. And I think we're about ready to start the next session. So thank you.