 Hello, my name is Philip Cohen. I believe you're looking at my shared screen now. Welcome to our session, which is called Science to Policy. What do we know about the science that becomes law? I'm Philip Cohen from the University of Maryland Department of Sociology. I will be moderating the panel today, which is basically just staying out of the way and letting the panelists present as much as possible and then facilitating questions and comments at the end. I think I will give a brief introduction of the speakers now at the beginning and allow you to and then we'll just go from one to the other once we get started. So we'll be starting with Rob Burt, Rob McCown, McCombe, sorry, a social psychologist and public policy analyst who is a professor at the James and Patricia Cowell Professor of Law at Stanford Law School. His work is concerns, drug policy, perceptions of fairness in the justice system, social influence and bias, the use of scientific evidence and many other topics. So Rob will get us started. On number two, number two will be Catherine Zeiler. Kathy is the Nancy Barton Scholar Professor of Law at Boston University School of Law, where she applies economic theory and empirical methods to the study of legal issues and research questions, an advocate of empirical legal studies, and she studies, among other things, the use of experimental and behavioral economics in legal scholarship and issues around medical malpractice insurance. Our third guest is the President of the Social Science Research Council, Anna Harvey, who is also a professor of politics at NYU, where she is the director of the Public Safety Lab and co-director of the Criminal Justice Expert Panel. So delighted to have Professor Harvey here and our last presenter will be Sean Grant. Sean is Assistant Professor of Social Behavioral Sciences at the Indiana University School of Public Health. His work advances evidence-based practice and uses the tools of meta-science. He is especially focused on where improvements can be made in the areas of open science, research synthesis methods, and stakeholder engagement. So we're going to have a lively panel. I'm going to, like I said, stay out of the way. I'll keep a quick clock. I'll keep a little clock running, but I think each of our panelists is going to stay to 15 minutes. And then if you have questions, you can put them in the chat or Q&A, and we'll have a round of questions and interaction at the end. So with no further ado, I will turn it over to our first speaker, Rob McCutten. Okay, hello, everyone. Should I be sharing my own slides? I'm going to put my slides up in the screen now. And so I'm going to be talking about fact-finding in law and comparing it with science, but also with empirical research on the law. And I'll just start out with what's probably a silly question, but just to sort of frame things. You know, are the laws of physics in any way comparable to the laws of the state of California? We use the same word law. Clearly, these are two very different notions of what law might mean, although both the rule of law and the rule of facts are important forms of social ordering. So I want to talk about the intersections of these. You're all familiar with Venn diagrams. I've got one here with scientific evidence and law and social science and law. I should say it's not drawn to scale. Obviously, social science and law is not as large as scientific evidence or law. I'm also not making a dig at social science and law when I leave some of it outside the circle of scientific evidence. All I really mean by that is there's a lot of work in social science and law and empirical legal studies that is qualitative. And today, I mostly be focusing on statistical science. And so I won't talk about qualitative research. So I want to compare and contrast some features that I think are important in comparing science and law. And there are some important parallels between the two, but also some very important differences. So I'll start out. The first two rows really refer to two different branches of epistemology, two different theories of truth. Truth is coherence and truth is correspondence. And they're not really competing theories. I think both of these play a role in both science and law. So in science, truth is coherence involves proper use of deductive logic. In some sciences, mathematical proofs to derive theorems. And of course, coherence in terms of probability norm, statistical norms, especially sort of the Bayesian framework for thinking coherently about updating probabilities under uncertainty. Now, the law also relies on truth is coherence. So a lot of particularly lawmaking by judges involves judgments of coherence of a fact pattern with written laws with precedence. But increasingly, the law does take seriously notions of probability and statistical theory. And in fact, there's quite a lively literature on Bayesian interpretations of legal fact finding and whether laws are coherent as representations of proper Bayesian updating. Now truth is correspondence is more the empirical side. And of course, that's a big part of what we do in science is trying to establish the reliability and the validity of our empirical evidence and both descriptively and our causal inferences. But truth is correspondence is also empirical evidence is a very important part of litigation, in fact, and negotiation. And as represented by a large army of experts who testify in cases, but increasingly, our law students at Stanford and I'm guessing get Boston and other places are taking classes on empirical legal studies, statistics and the law and so on. So again, there are some parallels there. Now, where we really get to the big differences, the methodology of truth seeking, there are two different theories of how to get the truth that are very much in tension. And here I'm talking about aspirations, not necessarily the reality, but the aspiration of the scientific approach is what lawyers would call inquisitorial. Inquisitorial legal systems are systems in which some neutral arbiter, like a judge actually collects all the evidence, weighs all the evidence and gives enormous power to this third party decision maker, who is supposed to be acting in a neutral way. And so that's the way most of us are trained that scientists are supposed to behave as neutral fact finders. And as this whole conference, I'm sure is illustrating in myriad ways, their science does not always live up to that. I'll talk about that more in a minute. Now, the American legal system is at the extreme end of a continuum from inquisitorial to ever sale. And what that means is that lawyers are supposed to be tenacious advocates for the side they're representing for their client. Their proper role is to be biased. Now, there's some important boundaries on that. They're not supposed to misrepresent evidence. They're not supposed to fudge data. But they are expected to, they aren't expected to be balanced. If opposing counsel fails to bring up some important fact, so much the worst for opposing counsel. And, but of course, we all know scientists often behave in an adversarial way. Now, there's a well known theory in the law put forth in 1978 by John Tebow and Lawrence Walker, in which they argued that the proper domain of science is conflicts of truth, and the proper domain of law is conflicts of interest. And so their suggestion is every serial system is a good system for dealing with conflicts of interest. Science is a better system for dealing with two conflicts. I think that's probably a little too glib. Every piece of litigation involves both truth conflicts and conflicts of interest. And increasingly, if anyone was working on a scientific project that has any political implications at all, there are going to be conflicts of interest involved there. So this is, again, these are sort of platonic types, and the reality is much more blurred. Now, some years ago, I offered reasons why adversarialism might work better in the law than in science. And so I just want to quickly go over that. And the most important row of this table is the explicit adversarial role. So when a lawyer behaves as an advocate, everyone knows they're doing what they're supposed to be doing. No one thinks they're supposed to, no one, only the most naive person would think that lawyers are being completely unbiased and disinterested. Also, in most legal disputes, well, just about any legal dispute, there are at least two sides represented. There is an explicit standard of proof in legal cases. There are several different standards that are used. It's a different question as to whether people actually comply with that, jurors and judges. There's an explicit third party decision maker who needs to produce closure. They need to reach a decision. And in most cases in the law, one of the two sides is right, or there's some grain of truth to both sides. But usually, the truth is somewhere within the domain of the dispute. Now, if you contrast science, science is quite different on these dimensions. First of all, when you represent yourself as a scientist, you are saying, don't look at me as an advocate. Don't look at me as an adversary. Trust me, I'm a scientist. Scientific disputes, it's not always the case that there are at least two sides represented. Sometimes everyone in the research community is on board with a single pet hypothesis. We have explicit standards of proof for statistical significance, but that's a very narrow part of science. And there are all sorts of scientific judgments that are made without a clear explicit standard of proof. And we can get away with that because there is no explicit third party decision maker, in most cases, who has to once and for all decide, is string theory our proper theory, or are we going to go with some other theory? It's an ongoing conversation that spans decades until a policy decision has to be made, and that's something I'll come back to. And then finally, we know from the history of science that scientific disputes often fail to bound the truth. It's often the case that none of the parties are correct. They're all wrong. Okay. So now, science plays a big role in the law, but it's mostly mediated through experts. I don't need to say a lot about this because I think you all know a lot about this. If you watch TV, you've seen countless examples of this. But in the ever-serial system, there's a marketplace for experts. We know a lot of my own research specifically manipulates expert confidence holding testimony constant, and jurors rely very heavily on expert confidence to judge the validity of the science. Now, the problem is the market for experts often rewards bias. So, of course, the lawyer is biased, but they often are looking for an expert who's also biased. And the market, even for unbiased experts, rewards overconfidence. And I can speak from personal experience of the pressures I've been under from lawyers to not hedge, to be more categorical in my statements. Okay. And one of the problems we have is frankly an enormous arrogance of PhDs in the law. And we got away with that for a long time, but those days are over and there's a great deal more cynicism. So, we really have to pay attention to what makes us credible and earning the trust of the people we want to influence with our research. I want to say a couple of things about specifically about social science and law. So, I have a long-standing interest in bias and the interpretation of research results is a 1998 annual review piece. The literature on that topic has just exploded in the past decade or so. And as you all know, there are a variety of problems that lead people to abuse or misuse statistical evidence. And on the basis of these concerns, there's really been an exciting progress in trying to up our game and try to minimize these problems. And, you know, we had traditional practices, methodology, peer review, replication, occasional replication, but the open science movement has really brought a whole new toolbox of pre-registration of hypotheses, sharing of data, many labs, replication, blinded data analysis. And I just want to mention how difficult it is to implement some of these in empirical legal studies for a number of reasons. First, the data are quite often confidential and we're not allowed to post them or share them. And so, that's a big obstacle. Direct replication is often infeasible because researchers are often studying a historically situated event that happened one time, like, what is the effect of a particular change in law, which happened at a particular time? And you can't go out and replicate that change in law again. Even when that's not the case, field work is considerably more expensive than laboratory research in the social sciences. That's probably not true in the natural sciences, but the social sciences, that's true. And requires getting a lot of stakeholder permissions lined up to do the research. It can be very difficult to pre-register your hypotheses because you often don't know what your measurement variables will be until you really are given access to case files and you start finding out what is measurable. And then finally, trying to achieve statistical significance with a rigorous alpha level is daunting because our statistical power is often capped by the available units of study. There may only be 29 judges in a jurisdiction to study. So, those are all challenges that we face. Some of these challenges can be addressed by blinded data analysis. And this is just an advertisement for an article I published in Nature with Saul Permutter, who is a physicist, describing some of the techniques that physicists use to reduce bias, which, by the way, has nothing to do with physics. It's just that they happen to stumble on these ideas first. And it mostly involves perturbing the data with some combination of noise and bias so that when you're analyzing the data, you can't be influenced by it. And this is something I think that could be done with expert witnesses. They could be given blinded data sets, apply their preferred method, and then don't lift the blind until after they've already submitted their code and done with the blind. Now, the final thing I want to talk about is emerging issue at the intersection of open science and the law. And that's what might be called the weaponization of open science. So, quick quiz. Who have been the biggest champions of open science in the federal government in the past decade? Scott Pruitt, the Trump's administrator for the Environmental Protection Agency, and Paul Ryan, the former Speaker of the House, both seized upon open science methods to try to influence regulatory policy by raising the standards for not, for example, not allowing EPA to use data unless the data were procured through open science methods and shared and so on. There are a couple of examples, the Foundation for Evidence-Based Policy Act. I think most of us, if we read this act, just on its own terms, we would say, this is all leading to better quality science. The call for greater openness is something we should foster. The Strengthening Transparency and Regulatory Science Rule is another attempt to bring open science to bear on the EPA's work. Now, the last thing I do not want to maintain that if a conservative endorses open science, that's suspect. That's not what I'm arguing. Rather, I just want to suggest, and I'm about to close here, but I just want to suggest that we can scrutinize political calls for open science, that there's some criteria we can use to see how seriously to take these claims. One of the issues is this being argued in good faith. I would suggest that it's in good faith if the speaker, if the advocate of open science, recognizes that there is a trade-off between rigor and responsiveness. Policy decisions have to be made. If we had waited for the best vaccine evidence, none of us would have had the vaccine by now. A corollary of that is trading off the cost of wrong action with the cost of inaction. Another thing that can make a call for open science bad faith, if it's mostly just applied to findings we don't like. One of the things I would ask about these recent efforts in the Trump administration and by Paul Ryan is why they focused on EPA regulation, but did not propose open science for criminal justice data, for military data, for economic research, all of which could arguably benefit just as much from open science. For these reasons, I'm going to suggest that I believe that these calls for open science were not in good faith, but that's not to say that government should not move toward open science. It's just we need to do so in a transparent and fair way. So I'll stop at this point. Thank you. Thank you, Rob. Okay, we'll turn it over to Kathy Zeiler. All right, let me share my screen. I think I hope you can now see my screen. Yes? Yeah. Okay. So first, thank you to Jason Chin for inviting me to participate and to fill up also for moderating. Thank you so much. And it's an honor to be among the distinguished panelists today. I'm relatively new to the field of meta science and I'm at the same time horrified and optimistic. So I think I'm probably not alone. I hope I'm not alone in either of those sentiments at this point. And so I'm going to talk a little bit about what might be considered a case study for this panel. And so I latched on to this piece of the panel description to be the focus of my talk. So I'm going to walk through what again might be considered a case study related to how research can be selected and summarized. I don't have anything much to say about how we might present it to policymakers, but I will say a word at the end about how research is used by policymakers. And the news is not good news. So just to give you a heads up. So I'll start with this book chapter. In 2013, I published with Lorian Hardcastle a literature review of the quantitative empirical studies that aim to estimate the effects of tort reform on medical malpractice insurance prices, something I've been obsessed with for over a decade. And in particular, we tried to draw inferences from 16 studies that we located that purport to estimate the effect of state statutory damages caps on so the limitations on how much an injured patient can recover in a tort suit against a doctor. How these caps affect medical malpractice insurance premiums, including some that I had produced as part of my own PhD dissertation. So one of the studies is actually my own. So I don't feel so bad bashing all of the studies because one belongs to me as an author. So at this point, roughly half of all US states cap the amount that patients can recover in medical malpractice claims against health care providers. And the caps generally focus on punitive damages and non economic damages that compensate for pain and suffering as opposed to economic harm. And researchers, including myself have capitalized on the variation in damages caps across states, as well as variation over time to develop identification strategies. So I'll argue it's an it's an important policy question. The obvious question to begin with is whether tort reforms meet their intended goal of reducing or stabilizing insurance premiums. This has potential implications on physician supply and the cost of health care. Although as a side note, given that so few injured patients sue medical malpractice insurance is a tiny fraction of the total cost of health care. So deterrence theory suggests that the effective damages caps on premiums is ambiguous. So while caps might reduce the average per claim payout, the reduced pressure on providers might encourage providers, those on the margin, to reduce effort enough to actually increase the number of lawsuits. And so if the increase in the number of suits swamps the reduction in per in per claim payouts, then we might expect malpractice prices to increase. They reflect a higher overall payout from the tort system. We might also be interested in research questions related to trade off. So for example, for a particular set of injured patients, especially those who endure great physical pain, but who suffer little in the way of economic losses because they don't incur additional medical expenses and they can't claim lost wages, for example, because they don't work. Caps on non economic and punitive damages might make it impossible to find attorneys willing to pursue these claims. It's quite expensive. Rob mentioned expert testimony and in medical malpractice cases, those experts are quite expensive to employ and to get to work on the case. And so the theory actually does not just go in one direction. So the predictions are ambiguous and it's important to look at data to try to figure out what might be going on. And so we have these 16 studies, at least as of 2013. So this is a bit dated, but my guess is nothing much has changed. I could be wrong about that, but I would be surprised. So we started our review by summarizing the estimates by statistical significance, which I now regret. So if you look at this table, you'll see the list of 16 studies and I'm in there somewhere in the middle and the publication types. More than half of these studies were published in peer review journals. There are a couple book chapters. There are some published by the Brookings Institute, which is a think tank. My studies that in a dissertation and then at this time at least one study was unpublished. And you can see from the covered periods that the studies cover a variety of different periods. There's no standard period covered. And for both premiums and for years that tort reforms were passed. So there's a lot of variation in the data that these researchers are using to try to estimate the effects of caps on prices of insurance. You can see that each study doesn't produce just one estimate. In fact, there are several studies that produce several dozen estimates using different models and different different controls, etc. And so across these 16 studies, we actually get 197 estimates of the impact of damages caps on premiums. And so we did a fine grain look at estimates, variation over estimates. And you can see from the table here that when researchers measure the impact on premiums both by premiums paid per doctor and aggregate premiums, we see a pretty wide variety in the types of results that researchers are getting. So the literature is mixed. We'd look at this and we'd say the literature is mixed. So our goal was to try to figure out what inferences we could draw from the pile of studies. And so we started by trying to uncover explanations for these mixed results. What explains the mixed results? And obviously it's going to be different time periods, different data sources were used. Not all researchers are using the same data sources to identify damages caps. Not all researchers are using the same data sources to gather premiums. They also, of course, use different identification methods. So these are non-experimental studies, they're observational studies, which offer a different type of challenge when it comes to replication and so on. And so for these reasons and others, what we hoped to do, which was a formal meta-analysis was, of course, impossible. We could not try, given the nature of the studies and this variability, to try to use these studies to produce one estimate. It was impossible. So our approach in the face of this data and methods variation was to try to identify which studies deserve the most weight when it comes to drawing inferences from them. So that's what we resorted to. So for example, which employed the best identification strategies and used the best data sources. And I'll explain our approach in the face of that difficulty and also say some closing words about the use of the research by policymakers. So the conclusion was pretty grim. And again, I'll remind the audience that one of my studies is in the pile. We ended up concluding that in fact, none of the studies deserves any weight at all. And for a number of reasons. So just to provide some examples of the reasons why we were reluctant to place any weight on the inferences that might be gleaned from these studies. First, generalizability of the results is very limited due to the nature of the premiums data that were employed by the researchers. So the premiums data, of course, is very difficult to get. It's very hard to collect these data. I know because I've spent the last decade trying to collect a random sample from the entire population of prices from State departments of insurance who regulate prices of insurance and have data on insurance. But most of the researchers, if not all of the researchers used data that is not a random sample from the entire population. So for example, one very popular data set is to use a voluntary survey of insurance companies who can report or not report. They can choose. And so we of course we might wonder whether the insurance companies who are willing to report to this data set actually are like the ones who don't are not reporting, choosing not to report. In addition to that, the premiums data are limited in a lot of cases to policies with particular coverage limits. So for example, we might expect that damages caps of say $250,000 on non-economic damages would affect prices of policies with per occurrence limits of $100,000 very skimpy policies differently than those with limits of $1 million per occurrence. And so many studies used policies of one particular type. So we have reason to believe that policies will affect the prices of different policies with differing generosity differently. And also that doctors after caps are passed might move from one policy type to another and that this sorting of doctors among different policy types might in fact impact prices as well. And these studies are not going to pick that up. In addition to that, damages caps, of course, are not randomly assigned. So we're not conducting an experiment here. And so we need through these observational studies to control for selection. That's absolutely crucial. But we found that it's the controls were not well implemented in any one of the studies. So including my own. So one example here is that many of the studies use a technique called difference and differences models. And that approach assumes parallel pretreatment trends. And I'm not going to go into the details. The most important part here is that not one study discussed or checked or reported on whether the pretreatment trends were parallel. And we know from studies of premiums outside this literature, one actually done by Dan Ho at Stanford Law that this actually is a material issue. So we know from other studies that the pretreatment trends are probably not parallel, which is a cause for concern. Finally, there are a number of other theoretical assumptions of the models that were not verified or adequately addressed. Basic assumptions for studies that, for example, just use ordinary least squares, very basic models. There was no check of whether the error terms are normally distributed. In panel data studies, the authors did not verify the absence of serial correlation of the error terms and other very basic assumptions. And so currently, I'm working with a co-author, Mike Frakes, who's at Duke Law, and we're hoping to conduct the 17th study using better data and better methods. And so to add to the knowledge that we have, which of course we don't think is very good at this point. So one question remains. There's one open question at this point, which is, what is the quality of these studies matter? So we might not care much about this messy literature if policymakers are not using it. So last week, I assigned to my research assistant to go out and try to figure out whether any of these studies had been cited in any government documents. And we found that just in a very concern, there's nothing systematic about the search that we did. So it was just a very basic search. And we found that at least half of the studies had been cited in legal documents. So just to give you a flavor for where these studies show up in law, court cite to them, so the Seventh Circuit Court of Appeals in a medical malpractice case cited to one of the studies, the Wisconsin Supreme Court cited in a decision about whether statutory damages caps violate the Equal Protection Clause of the Alaska Court decided on a similar issue and cited to one of the studies. So just reflecting back to Rob's talk, it seems like the adversarial system, at least under this case, study is imperfectly effective at weeding out the weak science. And legislatures are also citing to these studies. So for example, in across several years, as Congress was trying to push a federal tort reform bill, we see congressional records citing to these studies that we looked at here. So the House of Representatives is doing this, the Senate in their records is citing to the work Washington State is citing in its legislative history of its tort reform act. And the congressional budget office that scores legislation that tells us how expensive legislation is going to be or what going to be the effects of legislation, sites to these some of these studies in their report on effects of state tort reform. And we also found a letter, 2000 letter from a New York State Trial Lawyers Association to the New York governor at the time urging the signing of the statute eliminating some tort reform. So these studies are also cited in lobbying efforts to get government actors to do to do one thing or another. So I'll end there and I look forward to the other presentations into the to the questions. Thanks so much. Thank you very much, Kathy. And for sticking to your time, I appreciate it. We let me just remind our audience that you can put questions in the Q&A and I will love them in the appropriate direction when we get to that portion of the program. And so without further ado, Anna Harvey. Great. Well, thank you, Phillip. And, you know, thanks to Phillip and Jason and to the conference for inviting me. It's great to be here. So this is Kathy, that was a great segue into what I wanted to talk about the last part of your talk, namely about how do policymakers respond to evidence. So, you know, one way to think about this question of what what science gets adopted into law is to think about it as a research question and to think and to ask, can we use the tools of rigorous science to study how and whether policymakers respond to rigorous science. And so that means using policymakers as subjects in experimental designs, which is hard to do. And Rob talked about some of the challenges. But within the last few years, we've seen a few really nicely done studies emerge. And I thought it might be useful for me just to talk about those. This isn't work that I've done, but it's work that I've been drawing upon at the Social Science Research Councils. We think about how can we how can we best encourage policymakers to adopt the findings of rigorous research. So the first study that I want to talk about, and I'm just going to talk, I didn't make slides for this, is it is a couple of survey experiments that a team of World Bank economists did on policymakers, governmental employees who were invited to impact evaluation workshops being hosted by the bank. And so they did a couple experiments on on invitees before they attended the evaluation workshop. There's also some post workshop results, but just the pre workshop results. The first experiment they did, they, they, it was a it was a choice experiment. So they gave the invitees descriptions of two programs. And they said, all right, so imagine that you have to recommend one of you have to recommend one of these two programs to some counterpart agency in your country. And you have information from a study that was done about each program, which we summarize below. And they varied the attributes of the studies that were done. So one attribute they varied was the method. And this gets to one of Kathy's points. So they're either just purely correlational observational studies, quasi experimental and then full RCT experimental studies. They varied the location of the studies and the same is it in your country, same country, same region, a different country, or a different regional together. They varied the size of the estimated impacts of both studies or programs that we're attempting to move the same outcome. And call it, you know, enrollment rates, you know, kids kids school enrollment rates. And they vary the size of the estimated impact. And then finally, they varied the margin of error for 95% confidence interval, which, you know, would tell you if that, you know, given given the size of the impact and the margin of error would tell you if it was significant. They also offered one final piece of information, which is that quote unquote, a local expert tells you that they believe program A or program B, which they varied, would perform better in your context. And then they asked, they asked the policy makers, okay, which program would you recommend? And the kind of, you know, Kathy's, this is consistent with Kathy's grim finding, which is that policy makers were completely unresponsive to design. So whether it was correlational or quasi experimental or RCT had no impact on their recommendations. They were also unresponsive to precision or significance or the size of the confidence interval. And there's the results on location were kind of in between. It looked like they were more likely to recommend programs in their own country, but what really popped out were bigger estimated treatment effects, irregardless of the design of the study or the precision of the estimation, just big treatment effects, right, the point estimate, and whether programs were recommended, quote unquote, recommended by a local expert and the magnitudes of these effects were pretty large. They also ran the same experiment on the social science prediction platform with researchers to see what a community of researchers would, how they would respond to the same experiment. And not surprisingly, right, because they're running it on people like us in the experiment on researchers, methods and tight confidence intervals were highly important in addition to size of impact, but location and local recommendation, they're not important. And method goes in the direction you would think, so anything that was not correlational. So this is part of our concern about, here we are doing all this rigorous science, but maybe it has no relationship to maybe our, the rigor of the science has no relationship to policy choices. They also did a second experiment about searching for information in which now they're holding constant, there's a program and they're asking policymakers, essentially, which of the two studies would you rather see and they vary the attributes of the studies in order to make a decision about the likely impact of a program. And again, they're varying the same attributes. Here, there was a little bit more evidence that experimental and quasi-experimental that policymakers would rather see that, they have to see something, they'd rather see that than observational and some evidence that sample size matters. But here, estimated effect still popped out as very important for policymakers suggesting that when they're searching for information, there's going to be a bias among policymakers to want to see studies with large estimated treatment effects, which is going to skew their estimates of the effect they think that a study might have, conditional on thinking about a single program and looking for studies about it. And just one other study in this vein, which is a recent one, Daniel Chen and Siltan Malou did, I just came out a little bit ago on using deputy ministers in Pakistan where I want to talk about there was an intervention, but outside of the intervention, they ran a choice experiment on the ministers where they elicited ministers' beliefs about the impact of a deworming program on the long-run earnings potential of kids who would participate in the deworming program. And they got a baseline estimate of about a 5% increase in earnings estimates. And they also asked the ministers if they would rather implement a deworming plan or build computer labs for kids. And about 40%, if I'm remembering this right, 40%, if the sample said they'd rather pick the deworming program, about 60% said build computer labs for kids. Then they give an experiment, they gave the policymakers a signal, a summary of Michael Kramer paper on an RCT on the long-run impacts of deworming and the results of which suggested a much larger increase in earnings long-run, 13% increase in long-run earnings. And they gave them the figure and they said there's a recent randomized evaluation and here's what they found. And then they try to see if the ministers in the experiment updated and kind of similar to the World Bank experiments, there was no updating on the estimated effect. So even though they'd just been told that there was a recent randomized experiment and here's what it found, there wasn't any movement to increase their estimate of the effect and there was no movement in their choice of program. So it was just completely unresponsive to the provision of the evidence in the survey experiment. So that's the bad news. That's the kind of, that's the dismal grim news. But then there's the question of, has anybody found any interventions that seemed to increase trust in and uptake of rigorous science? And so actually all the, all these experiments and the one where I haven't talked about did, did use an intervention that seemed to increase uptake and I'll just kind of go over them quickly. In the World Bank, experiment using policymakers who were invited to impact evaluation workshops, they also did post workshop evaluations where they ran the same choice experiment, but post workshop and these were week long workshops conducted by academics and, you know, World Bank staff on basically the basics of rigorous impact evaluation. So they did find that in the, although in the pre-workshop survey experiment there wasn't any weight placed on a study having deployed an RCT design, there was now a weight in the post workshop survey experiment. So now that did increase recommendations, right, knowing that there was evidence that came from an RCT. In the, in the Pakistan experiment, this is, this one I really love, because they, they, they use the book that I use in one of my undergraduate classes. So they, they gave the ministers a choice between having a high or a low probability of getting sent either master in metrics by Ankerst and Fischke, which is the one I use in my undergraduate classes, or a book called Mindsight, which is a self-help book, which focuses on kind of positive psychology. And so they chose this higher, low probability that the ministers did, kind of, you know, of receiving, you know, either of these options. And then it was randomized within that, which book they got. And then they had to watch, they got the book and they had to watch a series of videos by either Ankerst or the author of Mindsight. And then they engaged in the series of like writing exercises. And okay, so there was like a six-week long kind of instruction program in essentially either in, you know, kind of causal inference in the basic econometrics or positive psychology. And then they did a bunch of, they did a bunch of, you know, evaluations of the impact of, of the treatment. And they found really large effects here. So this is the, this is the same group of ministers, which in the de-warming experiment that I just talked about, that's the control group that was unresponsive. So in other words, the group that had received the positive psychology book and training was unresponsive to being presented with RCT evidence about the long run, you know, beneficial impacts of de-warming. The treatment group, the group that had read Ankerst and Bischke, and had kind of watched the Ankerst videos, saw that their, their, their willingness to, their, their willingness to, to choose the de-warming program. And there was their updating of beliefs about the efficacy of de-warming, so large increases. So let's see, I think I took some notes here. So they updated, right, so they updated from, from the baseline of 40% recommendation of, of saying, you know, I would, I would choose the de-warming program. They updated 80% after seeing the RCT evidence signal. And they were much more willing to, to kind of rate the importance of quantitative evidence and policymaking. You know, they were asked, if they were asked what actions should they take before rolling out a new program, they were much more likely to say, well, I should probably conduct a randomized controlled trial, the whole series of things like this that dramatically improved. And so, so both, both the World Bank and the Pakistan experiment seem to suggest that there, there are things that we can do to engage with policymakers to increase the likelihood that, that rigorous science will be adopted into policy. And some of that is just training and education. And including what we, what we do in our own schools, right. And the last thing is that in-person delivery appears to help too. There was a recent experiment, a field experiment in Brazil, where the team worked with the, the Brazilian Conference of Mayors. And they were allowed to go to the, the annual conference of Brazilian Mayors. And they've ran, they had a randomized invitation to mayors to come to an information session that the research team is going to hold about, about a strategy to in, to increase tax compliance. And the strategy was one that had a bunch of RCT evidence. And it's essentially, you know, sending reminder letters. It's like a magic experiment to increase compliance. So they, the mayors that showed up came to the information session. And there was like a 45 minute presentation of the evidence and, you know, walking, walking through it, explaining it. And then they, and then they actually watched what the mayors did over like an 18-month follow-up period. And they saw about a 30% increase in the treatment group that mayors that came, that were invited to the session in terms of their adoption of the, of the tax compliance letters. So there it seems to be, that's, you know, that's different. It was the mode of delivery there instead of just kind of, you know, giving somebody a little one word paragraph that says there was RCT evidence. And then seeing how they update their beliefs immediately in a survey experiment. This was in person. It was, you know, 45 minute thing. It was, you know, somebody probably skilled at delivery. So there maybe the takeaway is that, you know, we have, we have some, some shoe leather like work to do in terms of conveying our findings to policymakers. And I'll just close with, you know, my own personal takeaway from this. So my own work over the last four, five years has been in criminal justice. And there's been a variety of findings that have suggested that quality of life offenses and prosecutions for victimless nonviolent misdemeanor offenses are not doing any good. And they're probably doing harm. So how do you get people to listen to that finding? So we contacted the major cities chiefs association, which is the 70 largest law enforcement police departments in the country. And we said, can we come to your conference and do a panel on, on our research funds? And they said, sure. So we were supposed to be in New Orleans, but IDA has bumped it forward. But, you know, I think maybe, maybe, you know, we need to, to hit the road a little bit as researchers and get out there and do a little bit more to, to, you know, deliver our findings with, with charisma and entertainment value and, and not just rely on our publish works. So that's what I have. Thanks. Excellent. Thank you very much. Okay. So we've had a couple of questions come into the chat. So that's great. Rob answered one, I flagged one that I'll bring up during the Q&A. So if you have others, drop them in the Q&A. Thank you. And now for our final presentation, Sean Grant. Great. Thanks a lot. Can you see the slides? Okay. Yep. Perfect. Well, thanks for moderating. And thanks so much to my panelists. I'm really, really excited to be here and be part of this panel. My name is Sean Grant. I'm an assistant professor of social and behavioral sciences at the Fairbanks School of Public Health at IU. And for my portion of the panel, I wanted to provide an overview of a program of work I'm doing with several colleagues, Evan May Wilson at Indiana University, Lauren Sepley at the William T. Grant Foundation, and Pamela Buckley at the University of Colorado Boulder. And we're looking on the ability of meta science to improve the trust and research that is used instrumentally to inform social policy and practice in the US context. So mechanisms that are actually creating infrastructure where there is a pipeline from research to policy and law. So this is under the banner of the evidence-based policy movement, which has grown significantly in the United States over the last decade. Evidence-based policy for those unfamiliar involves the use of rigorous research to build a credible evidence base that's then used to focus policies and resources on effective social programs addressing the needs of the public, in this case, in the US context. The stages involve reviewing evidence on the effectiveness of social programs, then incorporating that evidence into budgeting and policy decisions, ensuring that the programs that you select are being delivered effectively, and then determining whether those programs are achieving desired results feedback into the next cycle of funding and social programs in a given context for a given social problem. And in the United States, evidence clearing houses are the primary resource for the first step of this policy making process of reviewing evidence on the effectiveness of social programs and incorporating this evidence into budgeting and policy decisions. So for those unfamiliar, you might be asking, what is an evidence clearing house? So clearing houses are repositories of evidence that follow published standards to identify empirical studies that test the effects of programs, assess the validity and rigor of those studies, and then they disseminate information about programs they deem as evidence-based to the public, to researchers, to policy makers, to decision makers. So pictures here is the process for rating programs for blueprints for healthy youth development. That's a clearing house run by my colleague, Tamela Buckley. They focus on identifying evidence-based programs for youth and families. So as I mentioned, these clearing houses evaluate research according to explicit procedures and standards of evidence, and they're doing that to try and ensure that the programs that they designate as evidence-based are truly beneficial. And historically, they've done this by assessing prescribed causal inference methods, so things like random assignments to minimize risks of selection bias. And to their credit, they have been a large part in the last 10 to 20 years of making this kind of rigorous science, at least in the intervention research literature, more normative amongst researchers, journals, and funders. So they support evidence-based policy by distilling findings from the most trustworthy research to assist decision makers in selecting programs for which there is rigorous causal evidence. So I think for this panel, I think a takeaway is that clearing houses uniquely fit at this intersection of science and policy that we're interested in by supporting work to implement evidence-based programs at scale, programs that have rigorous science behind them. And in the United States, evidence clearing houses that are specifically supported by the federal government are particularly influential because they increasingly are affecting literally billions of dollars of federal funding for social programs across social policy sectors. And recently, these are used in what are called tiered evidence grant-making or financial models in which federal agencies award smaller amounts of funding to programs with little to no evidence and larger amounts of funding to scale up programs with strong evidence of success. As an example, in 2018, the Family First Prevention Services Act was codified in the Title IV of the Social Security Act, aiming to significantly reform the federal child welfare policy in the United States. And as part of this act, the actual statutory language in the law itself, so not the administration, but actually in the law written by Congress, signed by the president, called for the establishment of a prevention services clearing house that's now run by the administration for children and families in the U.S. Department of Health and Human Services. And the charge for this clearing house was to conduct objective and transparent reviews of research on programs that are intended to provide support to children and families and prevent foster care placements. And to give you a sense of how instrumental science is in this policy, under this act, moving forward, at least 50% of each state's expenditures are required to be on programs that this clearing house has given its strongest evidence rating, that is, well-supported programs based on randomized trials or quasi-experiments of these programs in the literature. So why is this relevant to this overall conference? MetaScience research has raised important concerns about the assumptions that underlie this model of evidence-based social policy. The first is concern about difficulties in replicating results from the social, behavioral, and health sciences. The promise of evidence-based policy is that we invest in programs with strong evidence of effectiveness, and this will increase the likelihood of positive outcomes for a policymaker's constituents. The time and money it takes to scale programs in a community is great, and it's equally challenging to take away programs that communities are already implementing. So reproducibility and replicability are essential to social program research, and particularly for scaling programs in the evidence-based policy model. The second concern is the prevalence of detrimental research practices that threaten the credibility and utility of evaluations of program effectiveness in the scientific literature. So these could be underspecified studies where methods, interventions, population settings, other important details are not shared in reports, which makes it difficult to have praised the quality of a study and the settings and populations to which results are applicable. The second concern is reporting bias, referring to when scientists or journals decide not to publish analyses, outcomes, or perhaps entire studies, often because results are not meeting some kind of threshold like statistical significance. Specification searching, also known in some contexts as data dredging and p-hacking, refers to repeatedly searching a dataset or trying alternative analyses until a result is found, often a statistically significant result, and then often failing to control for or report all of the tests that were undertaken. And then lastly, really understandable human error like technical errors that may exist in computational reproducibility, computational analyses, or perhaps copy and paste errors from Excel or your statistical software into Word. And then the third concern that I imagine has been highlighted quite a bit over the last two weeks has been the perverse structural incentives in the ecosystem supplying and identifying evidence-based programs. So just highlighting a few actors in our evidence ecosystem. Sponsors make decisions on which research is conducted and how that research is conducted, which directly feeds into our available evidence base. This includes whether projects include representative samples and whether funding for replications, for example, is available. Journals are more likely on average to publish papers with exciting and innovative findings over replications or null or negative findings. So scholars have reported anticipating a lack of interest in publishing papers with null findings, leading to more reporting biases leading to the file drawer problem. And then research institutions foster this publisher-perish funding or family, funding or family ecosystem through tenure and promotion policies and broader discipline-specific incentive and reward structures. And then that supports and feeds back to behaviors of sponsors, behaviors of publishers, an ecosystem that creates these perverse incentives, leading to detrimental research practices. But then to be positive, meta science and in particular for us, open science serves as a motivation as it provides opportunities to align scientific practice with scientific ideals, accelerating discovery and broadening access to scientific knowledge. So for example, transparency and openness could provide mechanisms for third parties like clearinghouses to check for research practices that threaten the validity and reproducibility of research. And open science focuses on making research products and outputs more usable and freely available to everyone, which could then be read or downloaded freely by stakeholders not affiliated with research institutions that have journal subscriptions like policymakers and evidence clearinghouses. So through this focus on pre-availability and trust and research findings and products, open science could accelerate the flow of scientific evidence into policy and law. The motivating question for our program of research is how to align meta science with the evidence-based policy movement. And our program of work is focused specifically to date on evidence clearinghouses in the U.S. and the journals that have published studies on program evaluations that those clearinghouses have reviewed. Our initiative, which we've called trust or transparency of the research underpinning these social intervention tiers of evidence follows a structured process outcome conceptual framework for evaluating the extent to which institutions in the scientific ecosystem promote transparent and open research. So we use a four P's kind of mnemonic for our framework starting with principles of open science, which for us are the standards from the top guidelines and the literature on applicable clinical trials for FDA approval. Policies of organizations like clearinghouse handbooks to codify their standards of evidence or journal's instructions to author pages. Procedures of organizations like the methods and tools clearinghouses use to evaluate studies or the manuscript submission systems that journals use as part of their journal publishing processes and in the practices of organizations. So the information that clearinghouses report on their website or the article templates used by journals to provide details, for example, on the use of open science practices. So our first study applied this conceptual framework to 10 clearinghouses that are sponsored by the federal U.S. departments of education, health and human services, justice and labor. As I mentioned before, these ratings are highly consequential because they are used to inform policy decisions through the kinds of tiered evidence grant making and financing models that I reviewed earlier. To evaluate the degree to which these clearinghouses support open sciences, open science practices, we downloaded their handbooks, other documents, we explored structured fields of intervention entries on their websites, and we did this work collaborative with clearinghouse staff to share any relevant information we weren't able to identify in our review. And overall, we did find that 70%, at least seven of these 10 clearinghouses, do consider at least one open science practice. And of the practices that they consider, replication is the only one that actually influences whether an intervention is rated as evidence-based. Aside from replication, clearinghouses do address other practices like public availability of results, study registration, protocol sharing, but none of these actually are required for an intervention to be designated as evidence-based. It's just information that the clearinghouse reports out. And then on top of that, clearinghouses do not synthesize the cumulative body of evidence on programs formally using statistical meta-analyses, but rather do vote counting. They count the number of studies with a p-value that's less than 0.05 for a given outcome. And if that happens twice, great, you've hit our highest tier of evidence. So one thing we recommend in the paper that there's a QR code for there at the bottom is for clearinghouses to consider ways in which their current standards of evidence could actually encourage detrimental research practices, like multiple hypothesis testing and selective non-reporting of studies and results to get on these lists that are tied to funding and prestige. Our second study applied this framework to the 341 journals that have published at least one study that a federal clearinghouse has used to designate a program as evidence-based. We searched their instructions for authors, fields in their manuscript submission systems, and article templates they used to disclose open science practices. We then quantified their use of open science practices using the top factor, which is a metric from the Center for Open Science that assesses the presence and stringency of any open science policies. And in short, the most common score was 0. That is, the most common score was a journal has no policies at all in open science practices, and the majority of journals in our dataset had a score of 0 or 1. So either no policies or a single policy for a single practice that didn't require the use of the open science practice, but rather just required authors to disclose whether or not they used that practice. And then lastly, my colleague, Pamela Buckley, who runs the Blueprints Clearinghouse, looked at the use of open science practices in actual program evaluations. So using their data from their Blueprints registry, they examined transparency practices and ADA evaluations of social programs that focused on the prevention of negative health and social outcomes for youth and families. They found that few reports had data code or research materials publicly available. While 40% had protocols that were registered, only 8% were registered respectively, with a quarter being registered before conducting analyses. And about one-third included details in a registered protocol describing the treatment contrast and planned inclusion, but less than 5% had a registered statistical analysis plan, like planned analytical methods or pre-specified covariates for the test of the effectiveness of programs. So we see the above as an epidemiological, meta-epidemiological baseline of where Clearinghouse's journals and researchers stood as of the end of 2020. And we started providing our findings as feedback to Clearinghouse's and other stakeholders in the federal context, who really do, again, deserve a great amount of credit for pushing the methodological rigor of intervention research forward over the last 10 to 20 years. So we think that there's a chance that that could be the next 10 years or so of pushing things like Open Science forward. So in addition to our findings, we've shared a draft top guidelines for Clearinghouse's, which at least one Clearinghouse, the one on home visiting for youth and families, has used to articulate standards for program evaluators on Open Science practices. But an important thing that we've gotten back from Clearinghouse's that kind of to Rob's point about the trade-off between rigor and responsiveness, they need these practices to become more common in the literature that they review before they can make them requirements for evidence-based programs. So we started designing and providing outreach to journals and program evaluators about the importance of Open Science practices for the research they publish and conduct that informs policy and practice decision making. So for the discussion, I would love to hear your thoughts on how we can align the Open Science movement and evidence-based policy. And then also, if you're interested in this intersection, please do get in touch. So I really appreciate your time and I look forward to the discussion section. Thanks so much. Wow, that's great. Okay. Thank you, Sean. Very good. Let me give the panelists, we have 20 minutes left. Let me give the panelists a couple of minutes to ask each other any questions that have come up for them before we turn to the public chat. Anybody? Rob? Oh, and then Kathy. Go ahead. Yeah, I have a question for Kathy. So you made an interesting point about different causal identification strategies, which is a very common problem in the literature. And I'd never really thought about it before, but how do you pool data when the different strategies are used? And I'm just wondering whether in principle it would be possible to pool the data before running it through the inferential state strategy. So you've got a sort of meta-analytic covariance matrix that hasn't been run through any particular model, and then you could apply different models. And not every model could be applied to the full data set. But, you know, I haven't thought this through, but would that it all be feasible, or is that just not possible? Yeah, I think it would. It's a really interesting point. And I, you know, the only thing that's required is to be able to get the data from the researchers, which is very difficult, as you were mentioning during your talk. So that's the main limitation, I think, that we're less willing to share our data with each other. But if somehow we could have some kind of a norm about sharing, I think that would be possible. So you could build the largest data set that's available, and then run the best, using best practices. Can I quickly hop on that? Just to say, Kathy, I don't know if you saw Rachel Meager had posted a paper a few days ago with a co-author about using RCT and observational, kind of the problem of meta-analysis when, like, they have very different designs, and using RCT and observational studies to de-bias each other. So where the problem with the RCT might be that they were, you know, they were, they have unreasonably large effect sizes because they were chosen to be conducted in places where the researchers thought they would find a large effect size, you know, and the observational studies said, but they can actually, they kind of simultaneously try to adjust for the differences across the two different types. Kathy, go ahead. Thank you. Yeah, that's really interesting. I'll look up that study. I just wanted to respond to Anna's really interesting point about, we need to go on the road more often, and I really, I think that's true. So just one point is that George Mason has these training programs, and they, and by federal judges, and I participated in those, and so they really like coming to those. I think we should do more of them. And I spoke to a couple judges after the last one I did, and they asked me to please send them potential clerks who were trained in empirical methods. So it seems like they have a desire to want to get better at this, but, you know, they're not, they're not so sure whether, whether they can do it themselves. So they're looking kind of for help. So I think we have a lot of work to do there. One other thing is that the American law and economic review is, is getting, adding a new editor position to attract and publish translational pieces. So that's another thing we can do as lawyers that are trained both in law and in some kind of, you know, economics or psychology, whatever to help translate the research for policy makers. So I think there's a lot of room for that work as well. Thank you, Kathy. That's actually interesting and relates to a question that was posed by Patrick Portia, which I'll ask and also elaborate on. The translational role is an interesting one, especially for those of us interested in open science. And if open science, one of the benefits of open science, I think practices are that they, I don't want to say, they're not reluctant to embrace the uncertainties in science. And of course in courts, especially, but also in advocacy, various policy advocacy work, uncertainty is not a virtue. So you have, you have advocates like Rob says lawyers, but also policy advocates and so on for whom uncertainty is a problem. And if scientists have done things like preregistered studies that didn't work out like posted drafts of papers, which are still available, like had their comments subjected to public comment, where there maybe are experts in the field making negative comments about their studies, anything, if you've ever been an expert witness, and I have not, I'm led to believe that anything like that can suddenly come up in the courtroom. And I wonder if there's a role, if there's a scientist role for the translational person who is slightly more certain than the practicing scientist who embraces uncertainty. And is that wrong for a scientist to do? In other words, if I do a meta analysis, or I've analyzed policy, and I come into the, I come in, can I come into a courtroom and say, I'm sure of this, even though in my own research or in the research where I'm engaged in discovering new knowledge, I am not so sure. So I guess that might be a question for Anna, but or Rob, whoever wants to take that, or anybody else. Anna, you want to take the first shot? Well, I don't know. I think, I think, I think there's, there's some low hanging fruit. And I feel like sometimes we, we, we are so, we're almost, I don't want to say too careful, because that's not what I mean. But I, but I think we are, we're kind of up here with, with thinking about how our rules of inference. And like down here are like, all the states adapting, you know, like Kathy's observational studies, where we're not even talking about like file door problems or anything. We're just talking about really, really crappy studies. From which nobody should be drawing any inferences. And so I feel like if there's a way to, and, you know, an expert, you know, so my, my, my knowledge of the, of the expert witness gig is that particularly in, in, you know, corporate litigation, the firms who are supplying expert witness testimony are, are, these are, these are all, they're all pretty high quality experts who are producing evidence and kind of like going against each other. And that, that to me is less of a problem than the fact that like this whole world of policymaking is being driven by, you know, not any kind of science at all. And so I guess, and, and to Kathy's point about the George Mason program is actually evaluation of that. I don't know if you've seen that, Kathy. This is another Daniel Chen paper. He foyid all of the names of the judges who attended it over the years and, and, and given, and there was, I think he has like invitations and then people who actually came. So he has a way to identify the effect and had a large effect on, on their decisions. So I think kind of just introducing some basic science principles into policymaking and judicial decision making would be a positive thing. Yeah, I'll add a couple of points. So I've actually published three papers and I don't know, six or seven experiments where we vary advisors or experts or witnesses, confidence levels. And while it's true that there's a huge market for confidence and people want experts to be confident, it can really backfire. So in our studies, the overconfident witness who then makes a mistake, even a small mistake, their credibility is hurt much more by that mistake than the less confidence, confident witness. And what we argue is what we should look for in our experts and what we should fight for is calibrated experts. And by calibration, I mean that you can trust their confidence level. And that actually liberates policymakers, because if I, if I tell you, this is something, boy, we really don't know yet. And I'm only 25% confident of this or 30% confident. I'm basically telling a policymaker, there's no, there's no scientific ground for you to make one decision. You're going to have to make your decision based on other criteria. And policymakers love that they would love to be liberated from the science. And then when we say, in this case, I'm really, you know, I'm 90% confident, you're basically saying now is when you should listen to me because I actually have something to offer. And the only way we can make that work is to kind of find ways to punish experts for not being calibrated, to weed out experts who are not calibrated, whose confidence statements don't match their knowledge. That's great. Sean has a reply. Then I have a follow-up, I think, which might be for Sean, but go ahead. No, I think this is really interesting. And maybe two things that throw out for folks who are specifically working in this judicial context on language on the severity of a test that you use to calibrate certainty using kind of Deborah Mayo's statistical testing and severity framework. In the medical sciences for developing clinical practice guidelines, there's this approach called grade, where they rate the evidence underpinning the effectiveness of intervention across the quality of the studies, how consistent their findings are, how direct the evidence is to the context of inference, publication bias and precision. And then they use that to say they have high, moderate, low, or very low confidence. And it's kind of similar to beyond a reasonable doubt, preponderance of the evidence, and then the middle apologies I forget, what's in the middle there, someone can correct me. Clear and convincing, thank you. So if no risk across those, there's no reasonable doubt. Studies are good, they're consistent, et cetera, but you might go down to moderate, clear and convincing, but you're not sure if there's some risk of bias. So perhaps there's something to learn from that system in the legal context. And then another group that I work with is the United Nations Intergovernmental Panel on Climate Change, and I've done work with them on the methods that they use to elicit experts' views on the thresholds for the burning embers diagrams. And they similarly have a process where they ask folks to rate what those cutoffs are and their confidence in them based on a replicable protocol for how to calibrate those percentages and likelihood. So it might be interesting if anyone's working in this space and expert witnesses in the legal context to see if those concrete examples can be adapted and translated into that context. Good question. Emeryn Siana has a question. Is there any research or reports on for evidence-based work when the research is overturned? I noticed I have the Zotero bibliography tool and I noticed I get a pop-up when a study in my database is retracted, which is great. But if you have, you know, I don't know how to do meta-analysis myself, but if you've done something like assemble a database of evidence and you've issued ratings and so on, what happens? Is there any policy of, is there a practice of debriefing or maybe Sean can speak to that in terms of the sort of the practices around this kind of question? That's a great question and it's, I think, an inspiration work going forward, so I'll get in touch. But to my knowledge, I can't think of anything explicitly in place. Your best bet is probably Cochran, which is the world leader in the medical and health literature on doing systematic reviews that are quite influential at the international and national levels. They're working really heavily methodologists there on systems for automatically searching literature for eligible studies and then trying to automate at least like title match screening. So as part of that, I think there are some conversations, if not pilot tests, on incorporating things like Zotero to signal retractions to then take those out of living systematic reviews or living evidence reviews that aren't static PDFs but are continuously updated when something triggers an update. But I think it's also a really interesting question for the high stakes almost. The clearing assets are kind of like formularies for the FDA where they're giving you approved practices to say, hey, let's link the studies that are justifying something being designated as an evidence-based program. And if Zotero triggers that, let's take it off a list so that, prospectively, this isn't being used to justify use of public resources. So nothing to date, but I love the idea to explore that. So I will get in touch and credit what credit is due. It's interesting. Retraction is an extreme case, but there's really a continuum of evolving knowledge on something. And so if the science evolves on something, that's part of the uncertainty question is what we used to be pretty sure about this, but lately we've had some new programs come along. Kathy, it looked like you were possibly getting ready to say something about that. I was just going to say, I saw a recent talk by Samine Vasquez, I think her name is. She talked about, she's trying to get post-publication review going, really talking about how to get a systematic post-publication review where experts, somehow experts are qualified and they have their own reputations, I don't know, like Amazon reviews, but better, where we're actually doing some continual post-publication review and common and that sort of thing. And I think it would be fantastic if we could figure out how to do it. And it made me think of Wikipedia, which is not perfect, but they have this community that they try to do systematic review of information that's posted and pull out what's not good and keep in what's good. So there are some models out there for that. Jason Chen speaks up in the Q&A. I agree, in civil cases, experts might do a pretty good job of uncovering flaws in each other's work and the research behind each other's work. The problem is in criminal cases, there's really only one expert and it's the prosecution's witness. I'm not sure how to deal with that. Maybe pre-registration or blinding the data would work as Rob suggested. Someone else, Rob, asked about the blinded situation. Do you want to say something about that? Yeah. First, the point about criminal versus civil is a good one. And harkening back to my table on why adversarial system might not be a good analogy to science, adversarial system does work best when both sides are equally represented. Truth is supposed to will out. It's supposed to emerge from the combat. And if it's unequal combat, that's not going to happen. I did want to say on the thing we were just talking about, a good case study would be 18 months of Anthony Fauci's statements, because you could really watch in real time him trying to adapt to changing evidence. And you can also see how the public responded to him changing his mind. And a lot of people don't like it when experts change their mind. And so it's a challenge. Okay, great. We have a few minutes left. I want to just toss it back to each of you for any concluding remarks. And I'll just go, well, we just had Rob, so I'll come back around to Rob. And we'll start with Kathy, then Anna, then Sean and Rob for one to two minutes each. If you have anything, Kathy. Yeah, sure. So I remain optimistic, especially after this panel. I think we have a lot of work to do. But hopefully, through conversations like this, we can coalesce around a handful of ways forward and make progress about how to translate the work we're doing and supply good evidence and get the policymakers to actually use it. Very good. Thank you, Anna. Yeah, this has been super interesting. And one of the things I immediately wanted to know, listening to my fellow panelists, is do we have any, I mean, Sean is pretty optimistic about the role that the Clearinghouse is playing. I'd be really curious to know whether we have any good evidence about their impact on outcomes. And maybe we can, there's some kind of setting we can design to see whether there's an impact on choices. I am probably a little less optimistic about their content and their presence. But it's an open empirical question. I'd love to know the answer. Good timing. Sean. Yeah, great to you. I think it is an open empirical question. And shout out to my colleague, Louisa Plea, who's a senior program officer at the William T. Grant Foundation, where they have an entire portfolio on use of research evidence. So I have a pilot grant now looking at a model at the state level in Indiana and working with folks at the county level who have used their list of approved programs to propose programs for funding and seeing whether that use fits the assumptions of this model. Thinking about Carol Weiss's framework for use of research evidence. Is it symbolic use? They've already decided now they're using this to justify a decision. Or is it truly instrumental and in line with this? And what are various facilitators that could lead to guidance or other types of supports for grantees, applicants to try and meet this model? And then I think there is this open question on even if they do that, do we see changes over time and population rates of the outcomes that these initiatives are trying to target pre and post launch of them using something like comparative interrupted time series. So I think there are places out there that are increasingly interested in funding work like that. And one other colleague to highlight is Catherine Oliver at the London School of Hygienotropical Medicine. She's really a forefront thought leader on use of research evidence and is trying to build a community of researchers across substantive disciplines interested in this overall topic. So I highly recommend checking out her work and this kind of community she's trying to build that I think is very aligned with the meta science community that attends these conferences. So you want evidence that evidence is affecting policy and evidence that policy is affecting the world. That seems like a pretty steep hill to climb. Thank you. Okay, final words Rob. Okay, I'll be quick. I mean one thing I just really like to emphasize, and this is something that increasingly I've come to believe is so improving the quality of our science improves our accuracy, our validity and our calibration. But there is a division of labor and it's probably not our role to say should policy be made on the basis of the existing evidence base. Policymakers have to make decisions and they take into account a lot other than the empirical research on political feasibility cost and things like that. And I do think we kind of overreach when we start telling them that we should adjudicate when it's finally time to make policy. I don't think that's our role. Okay, great. Well, this has been fabulous. This has also been recorded so I think we get to watch it later for things that you missed. I'm sorry we didn't get to quite get to everything but we got in a lot of good questions and I really appreciate the cooperation with the time and collegiality on the panel and the participation of the audience. So thank you very much for being here. Thanks for the organizers for bringing us together and inviting me to do it. So I guess that's it. Thank you very much. Nice meeting you. Thank you so much for this wonderful discussion and thank you everyone for joining us today. We appreciate it. We will be closing out the webinar currently and then relaunching it in 30 minutes just to make everyone aware but the discussion will continue in Remo and on Slack so please join us there and back in 30 minutes. I should thank Jason by name for putting this together. Thank you. Anyway, thank you. Thank you all. Thank you.