 Hello, everyone. My name is Michael Rotundi and I'm an Associate Professor of Biostatistics at York University in Toronto, Canada. First, I'd like to thank Neil and the Organizing Committee for giving me a few minutes today to share a little bit about my R package CRT size and the evidence-based algorithm. Now I'm going to shut off my video here so that it's a little bit clearer to see the presentation today. So this package is a little bit different from some of the other tools that you'll be able to see throughout the conference. So I'm going to include a little bit of background information where I'll discuss a little bit about cluster randomized trials. Then I'm going to discuss the evidence-based algorithm and finally wrap up with our demonstration of the evidence-based algorithm for cluster randomized trials in my package CRT size. Now, first let's introduce a little bit about cluster randomized trials. In contrast to an individually randomized trial, a cluster randomized trial randomly allocates an entire group of clusters or individuals to either the treatment or control group. Now examples of cluster randomized trials occur in studies that randomize entire families, classrooms, or geographic regions. Now why would we want to randomize clusters? There are two common reasons for the cluster randomized design. The first is experimental necessity. So one common example is the context of teaching interventions. In a teaching intervention, it's simply not possible for a teacher to teach different students in the same classroom using different teaching methods. So randomizing by classroom is a logical choice. The second common reason for cluster randomized design is to avoid treatment group contamination. Now in some cases where the intervention is simple and can be communicated easily between treatment and control group subjects, randomizing by clinic or geographic area to minimize the social interaction between treatment and control group participants can help reduce the risk of control group members learning about the intervention and biasing the results. Now, unfortunately, the cluster randomized design has several key complications. The first is that responses of individuals in the same cluster, so the same family or geographic area, tend to be positively correlated. And if we do not appropriately account for this correlation, the variance of our treatment effects, such as the relative risk odds ratio or mean difference is going to be underestimated, which is going to put us at an increased risk of type one error. Now, the primary parameter in a cluster randomized trial is what's called the inter cluster correlation coefficient or the ICC, which is denoted by row. Now, the ICC measures the degree of similarity between responses of the same cluster. But one of the challenges of working with cluster randomized trials is that the estimation of row is often subject to a lot of uncertainty, which is particularly important when we're planning to design a new cluster randomized trial for sample size and for sample size estimation purposes. Now, once we have an estimate of the ICC row and the cluster size M, we're able to calculate what's termed the variance inflation factor. And the VIF is simply a function of the cluster size and the ICC. And we could see that the VIF is easily calculated as one plus M minus one times row. One of the unique things about a cluster randomized trial is that we can see that even if we have relatively small values of the ICC row, we can actually still have very large impacts on the variance if we have a relatively large cluster size. Now I'm ready to highlight a little bit about CRT size, which is my a sample size estimation package for the design of cluster randomized trials. Now this package includes the traditional power based standard approaches, as well as what we refer to as the evidence based algorithm, which is the focus for today. Now full details about the evidence based algorithm and its applications to cluster randomized trials are in Rotundian Donner 2012, and all references are included at the end of this presentation. So the evidence based algorithm or the evidence based approach was originally developed by Sutton et al. in 2007. Now the aim of this process is to design your plan study, not only based on the study that you're going to be designing, but by powering it based on an updated meta analysis of the current literature and the proposed study. In this way we're able to learn a little bit about how the study is going to influence current the current literature and potentially current practice. Now this approach is becoming more common in the literature, given the recent emphasis on open science, reproducibility of study results, and ensuring adequate return on investment resources. And one of the unique ways of, one of the unique advantages of this approach is that we could potentially perform almost a type of utility analysis, where we can see and evaluate whether or not the plan study is going to be sufficiently large of in magnitude to sway clinical practice. And this way we can help determine whether it's truly worthwhile of performing the plan study. So we've included a brief overview of the evidence based algorithm for cluster randomized trials in the context of an odds ratio and a fixed effects model. So first, we're going to select the number of clusters available per group for the plan study. And we're going to denote this with K. First, we're going to perform a meta analysis of the current available information. So we're going to perform the standard steps of a meta analysis including appropriate literature searching and including any appropriate adjustments for clustering and any cluster randomized trials. Now, once we have this, we're going to have an estimate of the fixed effect log odds ratio, and it's estimated variance. From this information, we can then sample a newest spec size from a normal distribution centered around the theta hat F with variance as calculated in step two. Now for step four, we can obtain appropriate values for P to the control rate and the cluster size and row in the plan study. Now these parameters are often going to be based on a literature search similar cluster randomized design trials that have taken place, or potentially a small pilot study as well can be used here. In step five, we're going to generate individual level data according to this new effect size and the anticipated parameters. Step six, we're going to calculate the log odds ratio and its estimated variance, including any appropriate adjustments for clustering for this new hypothetical study. In step seven, we're going to take this new study and combine it with the existing meta analysis and re meta analyze the results. In steps two through seven, a large number of times, say 1000. We can then obtain appropriate estimates of the power of the updated meta analysis to determine whether or not the study is likely to produce a statistically significant result. To repeat this entire process with a revised value of K so other potential numbers of clusters that can be randomized, we can then impact the evaluate the impact of various sample sizes on the updated meta analysis including the previously completed and the plan trial. Okay. So now let's have a look at a hypothetical example in our. Let's suppose that we have two studies examining the effect of vitamin a supplementation on neonatal morbidity and our neonatal morbidity outcomes here are something like ear infections or the presence of high fevers. Now in global health research cluster randomized designs are actually quite common, because it's simply not practical to randomize health interventions at the individual level. So let's copy this code here. Sorry, I have to get out of here and get my cursor. So let's go into here. So I'm going to put in my previous studies from the hypothetical for the initial meta analysis phase. So the way this function is going to work is that we have our relative risk here, followed by the lower and upper bounds of 95% confidence intervals. So looking at the first study, we have an overall relative risk of approximately 0.65, which corresponds to an approximately 35% reduction of risk of morbidity due to the vitamin a supplementation intervention. Now we know that this is not statistically significant in the first study. And similarly in the second study, we have an approximately 66% reduction in risk. But once again, this study has very large confidence intervals and does not show a statistically significant effect of the vitamin a supplementation intervention. So I'm going to start the results here first before I then I'll walk us through because it does take a couple of minutes. So the function is n four props meta. The data is the matrix of previous studies that we provide which again corresponds to the relative risk and the lower and upper 95% confidence intervals model equals fixed corresponds to the fixed effects meta analysis. The measure here is the relative risk. And here we're able to provide a vector of different numbers of clusters that we're able to randomize so for reference we're going to consider 510 1520 30 going all the way up to 60 clusters per intervention group. Now the ICC that value that I'm specifying is 0.01 with a cluster size of n equal to 100. Now, I mentioned earlier that one of the interesting and challenges of working with cluster randomized designs is that there are a lot of sources of variability. So this function tries its best to help account for some of these. So we have a parameter SDM which corresponds to the standard deviation of the cluster sizes which in practice will allow varying cluster sizes. So this is the control rate. So in this case would be the rate of morbidity in the control group, and we can once again include an estimate of variability for that parameter as well. Iter equals 200 now that I've just used for computational purposes typically we want at least 1000 to obtain appropriately smooth power curves. Alpha equals 0.05 corresponding to our 5% two sided significance level and ICC distribution equals fixed. So this function in this demonstration we're using a fixed value of the ICC of 0.01. But the N4PROPS meta function actually allows a lot of different options. We have the fixed one. We have a uniform approach where we can specify a lower and upper bound of a uniform interval. And then we can sample from our ICC values from that distribution and we also have a truncated normal approach as well. Now, let's have a look at our results here. So as an initial step, the function returns a very basic estimate of the meta analysis relative risk here. So we can see an approximately 40% reduction of the risk of morbidity corresponding to a relative risk of 0.6. The 95% confidence limits are from 0.3 to approximately 1.2. So once again, this result is not statistically significant. Now, we note that for each of our numbers of clusters randomized per group, we can actually start to begin to see approximately what these power curves are going to look like. Now, one specific item to note is that there is a lot of variability here because this is only based on 200 iterations. Now, as the number of iterations gets larger, these will produce much more smoother curves. So as just from this example here, we see that from 10 to 15, it actually produced the exact same estimate of the approximate power. So that's simply due to sampling variation. But based on these results, we could see that randomizing approximately 25 clusters per intervention group would likely provide roughly 80% power here. Now, once again, we see some of the distributional ICC and cluster size assumptions are included for your reference. But where this gets particularly useful is we can perform quick simple plots to visually see what our empirical power is going to look like. So here we could see again that roughly from this example, with roughly 25 clusters per intervention group, we would have approximately 80% power. Now, I did previously run a quick estimate with approximately 1000 iterations, and we could see that once we have our 1000 iterations, these do present much more smoothly in practice. And it is likely in on the order of of about 22 or 23 clusters per intervention group to produce the 80% power. So let me just go back to full screen here and put my video back on for the last couple of slides. So in summary, the evidence based approach to sample size estimation examines the role of the new study in the literature. So in this way, we're able to power not just the study on its own, but to provide power for an updated meta analysis to detect a statistically significant and clinically meaningful treatment effect. Now, this approach complements traditional sample size approaches, but it does not replace them. I still strongly encourage researchers to proceed with a standard sensitivity analysis, so they can impact examine the impact of different ICC values treatment effects control rates, and so on, all on the on their plan sample size requirements. Now, finally, CRT size is available from CRAN. So if you're interested in using the package, it's freely available to download. Now, the key reference for this work is Rotundian Donner 2012. And in this manuscript, there are additional details on the use of different distributional assumptions for the ICC. So in there, we discussed the truncated normal version, the uniform approach, and so on. And it also includes a couple of other worked examples. Now finally, I want to say thank you for watching today and your interest in my art package. Thank you to Neil and the organizing committee as well for inviting me and giving me the opportunity to share a little bit about the evidence based algorithm for sample size estimation. And finally, if you have any questions or comments, feel free to reach out. My email address is mrotundy at your q.ca. And I'm always happy to hear about researchers who are using my art packages and learning more about your studies. So thank you again, and best wishes and enjoy the rest of the conference.