 Hello, and thank you all for being here today for my talk, Designing Early-Phase Clinical Trials with the PPC package. I'm Emily Zaber, and I'm a cancer biostatistician looking at the Cleveland Clinic in Cleveland, Ohio in the Department of Quantitative Health Sciences and the TOSIC Cancer Institute. Today, I will be telling you about a new R package I've recently developed along with some collaborators. I'll start by going over some background and motivation for the development of this package. In the context of cytotoxic treatments, which are chemotherapies, phase one trials in oncology traditionally have the primary aim of identifying the maximum tolerated dose or MTD. The MTD is defined as the highest dose that still maintains a certain pre-specified rate of toxicity, often set to 30 percent, and is identified through dose escalation. Designs for dose escalation trials include the rule-based 3 plus 3 design and the model-based continual reassessment method, among others. But with increasing study focused on non-cytotoxic treatments, such as immunotherapies, the MTD either may not exist or may not be relevant, and toxicities may either develop much later or even be chronic, making these treatments difficult to study with traditional dose escalation designs. As a result, it is becoming increasingly common to include dose expansion cohorts in which additional patients are enrolled in phase one after the dose escalation phase is complete. In this setup, the dose escalation phase is considered phase one A, and used to assess the initial safety of multiple doses, then the dose expansion phase is considered phase one B, and can have a variety of aims, including to further refine the safety of one or more doses, to assess preliminary efficacy, to explore the treatment and various disease-specific subtypes that all share a common biomarker that the treatment is targeting, or to further characterize the pharmacokinetics and or pharmacodynamics. The use of dose expansion cohorts increased from 12% in 2006 to 38% in 2012. But dose expansion cohorts are not always planned in advance, so it can be subject to on-the-fly decision making that can lead to poor statistical properties and very large sample sizes. For example, the Keynote 001 trial of Pembrolizumab was initially designed as a 3-plus-3 dose escalation trial, but went on to include multiple protocol amendments and ultimately enrolled a total of 655 patients across five melanoma expansion cohorts and 550 patients across four non-small cell lung cancer expansion cohorts. And you can see in this timeline a number of the cohorts that were added over time during the study. The solution proposed by the PPCC package is to plan dose expansion cohorts in advance using Bayesian sequential predictive probability monitoring. The sequential nature of the monitoring means that you are doing interim analyses periodically throughout the trial to allow for early stopping for futility. This makes trials more ethical by reducing the number of patients being treated with inefficacious doses or inefficacious drugs. The Bayesian approach allows for flexibility in both the number and timing of these interim analyses. The package includes options for calibration and optimization that will ensure the plan trial is ethical for participants and makes efficient use of resources while also operating within well-defined statistical criteria for control of type one error and power. Next, I will go over the trial design and calculation of predictive probability. Consider the setting of a binary endpoint such as tumor response as measured by the resist criteria in the setting of a study in patients with solid tumors. Each patient either has a response such that Xi is equal to one or doesn't have a response so that Xi is equal to zero. Then big X represents the total number of responses observed among all of the enrolled patients. There are little end patients at an interim analysis and a maximum of big N total enrolled patients at the end of the trial. P represents the probability of response where P naught represents the null response rate under no treatment or standard of care treatment and P one represents the alternative response rate under the experimental treatment. A common hypothesis for a dose expansion study with an efficacy aim is testing the null hypothesis that the response rate is less than the null response rate versus the alternative hypothesis that the response rate is greater than the alternative response rate. I'm going to show just a couple more slides with a little math but I'll focus on describing the concepts and words as well. The foundation of Bayesian statistical concepts is Bayes rule, a mathematical theory that specifies how to combine the prior distributions that define prior beliefs about parameters with the observed data. In this case, we put a beta prior distribution on our response rate P and our data X follow a binomial distribution. So we can combine these two pieces of information to obtain the posterior distribution which in this case will also be a beta distribution. The posterior distribution can be used to calculate a posterior probability. A posterior probability represents the probability of success based only on the data accrued so far. And we would declare a treatment efficacious if the posterior probability exceeded some predefined threshold theta. Then the posterior predictive distribution has to do with the number of future responses X star in the remaining N star future patients to enroll and it follows a beta binomial distribution. The posterior predictive probability represents the probability that the treatment will be declared efficacious at the end of the trial when full enrollment is reached. So based on this, we would stop the trial early for futility if the posterior predictive probability dropped below a pre-specified threshold theta star at any interim analysis. Predictive probability thresholds closer to zero lead to less frequent stopping whereas predictive probability thresholds close to one lead to frequent stopping unless there is an almost certain probability of success. Predictive probability provides an intuitive framework for interim monitoring and clinical trials as it tells the investigator the chances of declaring the treatment efficacious at the end of the trial if we continue enrolling to the maximum plan sample size based on the data observed so far. So let's see how the package works. You can install the development version from GitHub or the production version from CRAN in the usual ways. At the time of this recording, the package had been submitted to CRAN but had not yet been reviewed. So hopefully by the time you're seeing this talk it will be available there. I will introduce the functionality in PPC using a case study based on the phase one dose expansion study of a TZelizumab in metastatic urethalial carcinoma. This trial is of particular interest right now because a TZelizumab received accelerated approval from the FDA in 2016 for use in metastatic urethalial carcinoma based on the results of phase one and two study and pending the completion of phase three study. The results of phase three study ultimately showed no benefit of a TZelizumab over the standard of care therapy with respect to response rate or overall survival and the approval was voluntarily withdrawn by the sponsor in March of this year. Situations like this are one motivation for wide development and use of more rigorous methods of early phase clinical trials are needed, especially in the context of accelerated approvals that are being done often based on early phase clinical trial results. The original dose expansion study of a TZelizumab in metastatic urethalial carcinoma aimed to further evaluate safety and to examine the pharmacodynamics and pharmacokinetics. So there was no explicit efficacy in stated. This dose expansion study was not originally planned but was rather added later through a protocol amendment. The study ultimately enrolled 95 patients while this specific dose expansion cohort was not described in the original protocol so it's not possible to know the intended design or sample size. The original protocol did describe dose expansion designs for other patient cohorts which plan to enroll a total of 40 patients with a single look for futility that would stop the trial if there were no responses in the first 14 patients. This rule gave a 4.4% chance of stopping for futility if the true response rate was 20% or higher. We will use PPC to redesign this dose expansion study. We'll set a null response rate of 0.1 and an alternative response rate of 0.2. The maximum total sample size is set to 95 and we will do an interim futility analysis after every five patients are enrolled. And here I've listed the posterior and predictive thresholds that we'll consider and I'll explain those further in the next couple of slides. I'll also compare the redesign results to results based on a single interim look after the first 14 patients as originally outlined in the trial protocol for dose expansion cohorts and other disease subtypes. So one consideration is that for a design based on predictive probability monitoring to be acceptable both to investigators and regulatory groups, we need to conform to traditional control of type one error and levels of power. Recall that type one error is the false positive rate and power is the true positive rate. The function calibrate thresholds will jointly evaluate a grid of posterior and predictive thresholds and we'll calculate the type one error and power associated with each combination. So we want to consider a variety of thresholds so that we can identify a design with acceptable operating characteristics. Here's the code we use to conduct this redesign using the calibrate thresholds function. You can see that the null and alternative response rates are specified as well as the vector of sample sizes at interim analysis and the total final sample size. The next two arguments are the vectors of posterior and predictive thresholds that will be evaluated in a grid and the last five arguments are set to the function defaults so you can see details of those in the help file. There's a print option for calibrate thresholds results that allows you to limit the displayed results to a range of type one error and a minimum value of power, which you can see I've set here the range of type one error to be between 0.05 and 0.1 and the minimum power set to 0.7. So in the results table, we see the threshold being considered in the first two columns. The average sample size under the null, the proportion of trials that were positive under the null and the proportion of trials that were stopped early under the null. Note that the proportion of trials that were positive under the null represents the type one error. We also see the average sample size under the alternative and the proportion of trials that were stopped positive under the alternative and the proportion of trials that were stopped early under the alternative. And note that the proportion of trials that were positive under the alternative represents the power. These results are based on 1000 simulated trials, which is the default setting of the M-SIM option in calibrate thresholds. And even with the filtering, you can see there are a variety of results still available to choose from. So as I just mentioned, you saw from that table that even with some filtering applied, there were still quite a few design options to consider. So we proposed two optimization criteria to assist in the selection of a single design. The first criteria is called the optimal accuracy design and it identifies the point on a plot of type one error by power that has the lowest Euclidean distance from the top left corner. The second criteria is called the optimal efficiency design and it identifies the point on a plot of the average sample size under the null by the average sample size under the alternative that has the lowest Euclidean distance to the top left corner. Using the optimized design function, we can extract these two designs and note that it's important to set the filtering by type one error in power to what would truly be acceptable as these constraints are especially important for selecting an optimal efficiency design that still has acceptable type one error in power. And you can see here the two resulting optimal designs. We can also see that a variety of designs in an interactive plotly, the diamond shape represents the optimal design on each plot and we can hover over the point to see additional details of each design. So we can see the average sample size under the null, the average under the alternative, the distance to the optimal point, the design thresholds and the type one error in power. And so this is a nice way to also compare designs. Based on these results, we may choose to select the optimal efficiency design. The design has several desirable features including a low average of just 39 patients under the null, a type one error of 0.06 and a power quite close to 80%. Contrast this with the results based on the original protocol design which led to an average of 76 patients being treated under the null, a two stringent type one error of 0.005 and a much lower power of 52.8%. So far we've seen how to calculate the operating characteristics for designs based on different combinations of posterior and predictive thresholds and to plot the results and see two options for optimal designs. To make the selected design easy to implement, we next introduced the function calc decision rules which will return a decision table at each interim analysis based on the selected design. So in this case, let's say we selected the optimal efficiency design with a posterior threshold of 0.92 and a predictive threshold of 0.1. We can enter the schedule of interim analyses, the total final sample size and the two selected thresholds and the null response rate to obtain a decision table. And these last four arguments are just the default arguments to the function. And this is what we get. We see there is one row for each interim look and the sample size of that look is in the column labeled N. The column R shows the number of observed responses at each look that would lead to a decision to stop or continue. So at each time, if the number of responses is less than or equal to R, we would stop the trial for futility. If the number of observed responses is greater than R, we would continue enrolling. The column PPP shows the predictive probability associated with this decision. And if there's an NA for any results, like in this top row of this table, this means there is no setting where we would stop the trial at that look. So we would always continue beyond the first five patients in this case. Then at the end of the trial, the treatment is considered promising if the number of observed responses is greater than R. So in this case, at the end of the trial, we would declare the treatment promising if greater than 13 responses were observed out of the total of 95 enrolled patients. There was also a plot option for the decision table results. The plotting is particularly useful in the two sample case when decisions are made for a combination of the number of responses in the control and experimental arms at each look. Here in the one sample case, the plot simply shows the sample size at interim analysis on the x-axis and the number of responses on the y-axis. And each box is colored in with whether we would proceed or stop and we can see details by hovering. Just a couple of final points. I just alluded to this, but while I have demonstrated all of the functionality for the one sample case where we have a single group being treated and compared to a hypothetical null response rate, all of the demonstrated functions also work for two sample trials. For example, if you are randomizing an early phase trial to control versus experimental treatment. I also want to point out that calibrating the thresholds is quite computationally intensive, especially for a large grid of thresholds or trials with many looks. The function is parallelized using the future and fur packages, but the user will need to set up an appropriate call to the plan function from the future package for their system. And lastly, I want to acknowledge the other contributing authors to the PPC package, Brian Hobbs from the University of Texas at Austin and Mike Kane from Yale University. And I will leave you with a link to the package website which contains a detailed vignette demonstrating how to use all of the functions as well as a link to the slides from my talk today and some contact information. Thank you so much for listening.