 For our next speaker, we have Quinyun and communicating the robustness of the COVID-19 studies, and I will let her take it away. Hi, everyone. Today I will talk about work that has been an effort of a large group as listed here. We are from different universities and different disciplines, including social science, public policy, epidemiology, and public health. Briefly speaking, we are working on providing an approach to help better communicate the uncertainty or strength of evidence or the robustness of COVID-19 studies. The COVID-19 pandemic is forcing researchers and policymakers to accelerate the evaluation of treatments and vaccines. Given the urgency of the epidemic and the range of stakeholders involved, we really need a shared interpretation of the robustness of scientific findings. It's becoming more and more important than ever. Our goal is to provide such an approach so that we can have an intuitive assessment of research findings in terms that facilitate public health policy decisions. Why is this difficult? Well, one challenge is that, unlike those non-pharmaceutical interventions, we actually don't know what's going to really work here. There are a lot of research going on utilizing many different research designs, and particularly a lot of those research are of our small studies. And interpreting the robustness of small studies are not that straightforward. At the same time, just as some dashboards that are collecting information and results from different studies, it is not enough that these studies coming out. But also we need to synthesize all these studies that are coming out, and we need to make sense of them in almost a real time. So these are two big challenges we are having right now. So our approach is that we think inferences from all sorts of studies are actually always imperfect and uncertain. But we can always quantify the imperfection and uncertainty of your inference. To do this, we ask how large the bias needs to be to change your inference, or what would it take to change your inference? Dr. Ken Frank has been working on this for a very long time in a lot of different contexts. Together with the group, approaches have been developed for continuous outcomes and have been expanded to other models, including logistic regression and remediation. And also we have built Shiny R app, a bunch of different R functions, as I will show you later. And also we have data functions. So all of this has been already a pretty large effort, but it's not necessary related to COVID or medical trials. But when the COVID came out, we engaged with some other people who are epidemiologists or medical doctors. We are also who are also co-authors of this paper. And we thought we could actually apply the approach to the emerging COVID medical trials, and that could be useful. So I will start with an example, which is a first randomized trial of HCQ. This is an early study conducted at Renmin Hospital, Wuhan, China. 31 of 62 patients were randomly assigned to receive HCQ in addition to the standard treatment. And the inference they drew from this study is that HCQ is efficacious. The basis of their inference can be shown as two by two tables. They looked at several outcomes. One outcome is a reduction in pneumonia. This is a two by two table for this outcome. So we can see in this table in the control group, 17 out of 31 patients improved, while in the treatment group, that's this whole 25 out of 31 patients improved. So it is 81% versus 55%. And this is showing that the treatment has a significant positive effect. But we want to say more about the strengths of evidence here. We know this is statistically significant, but it is not a double-blinded trial, and this is a very small study. Can we better understand the robustness of the finding here? To do that, we start with a general approach our team has been working on. We call it a case replacement framework. Say you have a study where the estimated effect is six and you have a threshold of four. Then we can calculate that one-third, which is the black part, one-third of the estimated effect of six exceeds the threshold of four. That means one-third of the estimated effect would have to be due to bias to change your inference. Importantly, we can think about this one-third as one would need to replace one-third of the observed cases with zero effect cases to reduce the estimated effect of six below the threshold of four. So your inference would change. The key here lies on the replacement of observed cases with zero effect cases. This is why we call it a case replacement framework. So in this framework, the more evidence you have or the less uncertainty the study has, the more observed cases need to be replaced with zero effect cases. For example, say your estimated effect is not six but eight, then you would need to replace 50% of your cases to change your inference. And comparing 50% to one-third here, we know that study is more robust than this study. Notice that in this framework, the threshold can be pretty flexible. So you can apply this framework to various thresholds. The threshold can be based on substantive importance, such as a minimal level of clinical importance, or it can be based on statistical significance. For example, in a lot of scenarios, we are using the effect size where the study is no longer significant at 0.05 level. Now we apply this framework to dichotomous outcomes. Say we replace cases from treatment survival category. So we know that in 2x2 table, we have four cells. We start with replacing cases from one of the cells, which is treatment survival category. And we define the number of treatment success cases that would have to be replaced with zero effect cases to invalidate the inference as RIR, meaning robustness of inference to replacement. And then we use this RIR to quantify the uncertainty of the inference. Let's go back to the HCQ example and use this approach to quantify the robustness of inference that HCQ is efficacious. Then in our study experiment, we replaced three cases from the improved HCQ group. So we replaced three cases from this group with cases for whom HCQ has no effect. In other words, zero effect cases. But how can we know which cell the three replacement cases would go where they go to this cell or this cell? To answer this, we looked at the entire sample. Under the now hypothesis, HCQ has no effect. And then 20 out of 62 cases, which is around 32% cases, experienced, exacerbated or unchanged rather than improved. Applying this, we will know that 32% of the three replacement cases, which is approximately one case, would go to exacerbated or unchanged. And then the other two cases will stay here. So in the three replacement cases, one would go here and two would stay here. As a result, we would see one fewer case in the improved HCQ group and one more case in the exacerbated or unchanged group. From another perspective, we can also interpret this as one case switched from this group to this group. The switching approach is actually known as fragility in clinical epidemiology literature. Now, after the replacement or the switching, the inference changes. That is, HCQ is no longer efficacious anymore at the 5% level. Now in the treatment group, we have 24 out of 31 patients improved and in the control group, 17 out of 31 patients improved. And the p-value increases to 0.06. In other words, to invalidate the inference, you then need to replace three treatment success cases with now hypothesis cases, meaning RIR is equal to three. This is equivalent to transferring or switching one case from treatment success to treatment failure. As an extra note here, for those who are familiar with fragility, Walter et al. in one recent paper has illustrated that one switch can have very different meanings depending on how rare an outcome could be. And here the RIR framework complements fragility by calling for the rare, how rare an outcome could be or what percentage of cases would experience failure or success. In this example, I'm using the entire sample to estimate that and one can also use the control group to estimate that. In this figure, we extend the HCQ example by plotting RIR against corresponding estimated effect sizes along a continuum to represent a broader potential set of thresholds. Each data point here represents the RIR to reduce the estimated effect in the HCQ example below a particular effect size. So here we can see that consistent with our previous discussion, one would have to replace three of the observed treatment improved cases, with cases for which failure rate equals to 32 percent to reduce the estimate effect of 0.26 difference below the threshold of significance at 0.05 level. So it's from here to here. At the same time, this figure also shows that an RIR of about 16 to reduce the initial probability difference of 0.26 to 0.1. So from here to here, we would need to replace 16 cases. These 16 replacements would generate five switches, meaning fragility equals to five. And more generally, this figure represents the RIR with respect to any effect size, including effect sizes that define a minimal important difference. Now, as a second example, as evidence from multiple RCTs accumulates, adding the RIR to meta-analysis of RCTs can help assess and visualize the robustness of inferences beyond reporting or examining P values. To illustrate, we recreated as a study by study accumulation of 16 estimated effects presented in a meta-analysis of randomized trials examining the impact of anti-hypertensive treatments. In this figure, we present a series of robustness updates as each study was added in the hypertensive meta-analysis, where each subsequent point presents an updated estimate effect and the corresponding RIR was reported in each box here. Critically, the combined estimated treatment effect fluctuated by several percentage points until the A study, which was conducted in 1979. As studies progressed, the estimated treatment effect stabilized and the number of replacements it would take to invalidate the inference increased substantially. Continuous updates to an analogous figure using COVID-19 studies would present decision makers with an up-to-date and intuitive characterization of combined estimates as well as the robustness of the inference drawn from scientific evidence. To make this approach more accessible, we have made several functions in R to realize this, because this is still a work in progress, so I wouldn't need to install the development version as a code here. Then once the user installs the function and elaborates, they can call the function named as t-confund, and the four most important arguments for the function is for cells in the 2x2 table. The 14, 17, 6, 25 are the four numbers we saw before in the hcq 2x2 table. The function will return the user enter value that's our observed 2x2 table and the 2x2 table after replacement to change the conclusion. Also, in the function, the users can specify the p-values they want to focus on, or they want to use for the threshold, and whether they want to do the replacement or switch in the treatment role or in the control role, or what kind of tests they want to use, either high-square or for sure exact test, and whether they want to use the entire sample or the control role to estimate how rare the outcome could be. We also provide another function called t-confund and then underscore fake, and the user can use this function to reproduce figure that we have seen before for the hcq example. So in this figure, the RIR is plotted against corresponding estimated effect sizes, allow a continuum to represent a broader potential set of thresholds. Each data point represents the RIR to reduce the estimated effect below a particular effect size, and the arguments for this function are exactly the same as the arguments for the function before the t-confund function. In addition to the t-confund, in both our R-shiny and R-package, we also provide other functions to apply similar framework to conduct sensitivity analysis for a broader set of models, including models for continuous outcomes and logistic regressions for dichotomous outcomes. I listed three main functions here. The first one is t-confund, and the users can use this to conduct sensitivity analysis for published studies. So say you read a paper and you know the estimated effect standard number of observations and covariates, but you don't have access to the original data set, then you can use t-confund to conduct sensitivity analysis. You can use t-confund function to conduct analysis if you fit models in R, so you have the original data. Finally, you can use m-confund to conduct analysis for meta-analysis. To summarize, we introduced a case replacement framework for sensitivity analysis of clinical trials. The framework supports statements such as the inference would change if a certain number of the treatment patients who experienced a benefit were replaced by patients for whom there was no effect of the treatment. The framework also complements fragility by accounting for the rarity of negative outcomes, and also the framework can be used for any threshold, including the minimally important difference and statistical significance. Also, the framework applies to a broader set of models and research designs. Finally, back to our introduction, we hope that expressing uncertainty in terms of patient experiences make the robustness of inference clear even without deep knowledge of probability and statistics. We hope it could be especially helpful for small studies as interpreting the robustness of small studies can be tricky. And we also hope that the R&R can facilitate a common understanding among researchers, policymakers, journalists, clinicians, and the public about the strengths of evidence for potential new interventions. And this is an ongoing work, and we really look forward to your comments, suggestions, and questions. Feel free to send emails to any of us. Thanks. Thank you, Dr. Lin. I definitely wanted to join the chat group with their comments that this was an excellent talk. And thank you very much for your work and sharing that today. I really enjoyed that. Thank you. Let's see. Well, there is actually, there aren't any questions posted. And all right. So with that, we will go ahead and close this session and start the next. But once again, thank you very much for joining our conference and sharing.