 Okay, so our next speaker is going to be Will Landau and he's going to be speaking about reproducible computation at scale with Drake. And this is, Will has nicely prerecorded this for us. So he will be available on chat to answer questions for you during the talk, as well as there will probably be a few minutes left after the talk to finalize any questions. So let's start. Thank you all for coming and thank you to our medicine for the opportunity to speak today. In the life sciences, we developed ambitious computational workflows for statistics and data science. There's a lot of Bayesian analysis and machine learning and simulation and prediction. And we need to think about both efficacy and reproducibility right from the start. Many of these projects require long run times. Methods like Markov, chain Monte Carlo and deep neural nets are computationally expensive and it could take hours or even days just to fit a single model. That's fine if you're only going to run the project once or at regularly scheduled predictable times, but if the code is still under development and you're making a constant stream of changes several minutes in real time, it's easy to get trapped in a vicious Sisyphean cycle. A large workflow usually has a large number of moving parts. We have data sets, we want to pre-process or simulate analyses of those data sets and summaries of those analyses. And if you change any one of these parts, whether it's a bug fixed or a tweak to the model or some new data, then everything that depends on it is no longer valid and you need to rerun the computation to bring the results back up to date. This is seriously frustrating when you're in development and changes are coming in real fast. You're making, like I said, several a minute updates to the code of data, new artifacts, all of this in real time. And if every one of those changes means you need to rerun the project, there's no way the results can keep up. Unless you use a pipeline tool. There are pipeline tools for production which resemble Apache Airflow and there are pipeline tools for development which resemble what you do make. And today I'm going to talk about make like tools because those are the ones I think are designed for this part of the process. It's an action packed space and there's a lot of great options but unfortunately there's not a whole lot for R. And that's where Drake comes in. Drake is a make like pipeline tool that is fundamentally designed for R. You can call it from an R session. It supports a clean idiomatic function oriented style of programming and it helps you store and retrieve your results. Most importantly, it gets you out of the Sisyphean loop of long computation. It enhances reproducibility and it takes a lot of the frustration out of data science. Let's go to an example. So I'm part of the capabilities team at Lilly and much of our work revolves around the design and simulation of clinical trials. In the first several months of 2020, we helped design several trials for potential new treatments of COVID-19. We used simulation to assess the operating characteristics of these trials and help determine features like sample size, primary endpoint and even when the trial should stop. This was a multi cross functional multidisciplinary approach with strong statistics leadership in that mix. Now this slide has a mock example of clinical trial simulation study. It's not the actual simulation study for any one real life trial in particular. It's oversimplified for pedagogical purposes but it does represent how my team and I set up the computation for this general kind of problem. We use Drake a lot and we use Drake a lot in this way. So this is a mock phase two trial and the goal of the simulation is just to understand the trials operating characteristics. So when is this trial going to claim the therapy works? When is it going to claim the therapy doesn't work? Under what situations is it going to make each decision most of the time? And we want to design a trial that makes the correct decision without an unnecessarily large sample size. So one of the things we might pay attention to is well there is a group of 200 patients good enough is it a large enough sample size. So suppose we want to enroll newly hospitalized COVID-19 patients and measure the number of days until they're cleared to leave. We randomize in the simulation half the patients to treatment, half to placebo and measure the drugs ability to shorten the hospital say. For the end of the trial there are multiple prebuilt criteria to determine whether the therapy moves on to phase three studies including patient safety, efficacy, cost effectiveness and more. But suppose we meet the efficacy criteria if the posterior probability that the hazard ratio of hospital discharge exceeds one and a half is greater than 60%. We assess the design of the trial with the simulation at the bottom of the slide. First we draw time to event data for each simulation study from the distribution. Then we analyze the simulated data and evaluate the efficacy rule using a Bayesian proportional hazards model. And we repeat for many simulations we aggregate the results to figure out what the efficacy decision of the trials is likely going to be under different effect size scenarios. So that's the background. Now how do we implement this? Let's have a look at the file system of this project. We have R scripts to load our packages, our custom functions and something called a Drake plan which I'll get to later. We also have an underscore drake.R script to configure and set up the workflow at the top level and some other top level run scripts for just for convenience. We also have an SGE template file. SGE stands for sun grid engine and this helps us distribute the workload across multiple nodes of a grid engine cluster. Most of the code we write is going to be in the form of custom functions. Now this may be unfamiliar to a lot of folks who are used to writing imperative code, numbered scripts or just putting everything in a bunch of our markdown reports. Functions scale much better for big stuff. A function is just a reusable set of instructions with multiple inputs and a single return value. And usually those inputs are explicitly defined and easy to create. And usually the function has an informative name. Functions are a fundamental built in feature of almost every programming language and they're particularly well suited to R which is designed with formal functional programming principles in mind. The most obvious use for functions is as a way to avoid repeated code scattered throughout the project. So instead of copying and pasting the same block of code everywhere, you just call the function. But functions are not just for code you want to reuse. They're also for code you just wanna understand. Functions are custom shorthand. They make your work easier to read, understand and break down into manageable pieces to document, test and validate. And it really helps bolster the reproducibility and reliability of clinical research. Most of our functions revolve around three kinds of tasks, preparing data sets, analyzing data sets and summarizing those analyses. And this is one of the top level functions of the data piece. It accepts an easy to generate set of design parameters as arguments and it returns a tidy data frame of simulated patient level data. Inside the body, it calls another custom function called simulate arm, which we define elsewhere in the functions file. Another custom function is called model hazard and actually fits the model and it uses custom functions, run chain and summarize samples to generate a one row tidy data frame of the results for a single simulated trial. At this point, you already have something to take away and apply, even if you decide not to use Drake. This function-oriented style still has a lot of value. However, if you're thinking about using Drake, then converting to functions is almost all the work that's involved. Once you've done that, you're almost already almost there. And all you need now is to outline the specific steps of the computation, how those functions fit together in an object called a Drake plan. And this is how you define that plan. There's this Drake plan function and inside a call to it, you list out steps called targets. Each target is an individual step of the workflow. It has an informative name like SIM or patients and it has an R command to invoke the custom functions and it returns a value at the end. Drake has shorthand to define entire groups of targets. So I'm jumping right into this patient step. So because of this map dynamic equals map SIM, we define a patient level dataset for every simulation repetition. Later on in the plan, we have targets to analyze each dataset, summarize each effect size scenario and at the end we combine the results into a single readable data frame. The Drake plan function doesn't actually run any of this work just yet. It simply returns a tidy data frame of the work we have planned. We've broken down the workflow into targets and we do this because we want Drake to be able to skip some targets if they're already up to date and just run the ones that need to refresh. And this is going to save us loads of runtime. It's always good practice to visualize the dependency graph of the plan before you start. So Drake has functions to do this for you and it really demystifies how Drake works. So here you see the flow of the project from left to right. We decide how many simulations we're gonna run. We run those to generate the patients, run the models and then summarize them. But how does Drake know that the models depend on the patients? The order of the targets you write in the plan doesn't actually matter. So Drake resolves this dependency graph because it notices the symbol patients is mentioned in the command for the models target in the plan. And that's why in fact we get one model target for each patient level dataset because of the dynamic branching that's in this plan. So Drake scans your commands and functions without actually running them in order not only to look for changes but to also understand what parts of the code in which targets depend on one another. And this is called static code analysis. To put it all together, we use a script called underscoredrake.r. We load our packages, functions and plan. We set options to farm out to the cluster and we end with a call to Drake config to put all this together. Actually run the workflow, we use a function called rmake. So this creates a new clean reproducible rprocess, runs the underscoredrake.r file to populate the new session. It results at dependency graph and it runs the correct targets in the correct order from that dependency graph and writes the return values to storage. Throughout this whole process, Drake distributes targets across the Unova grid engine cluster we configured it with. You can also configure it with slurm or torque or just the cores on your local laptop. The Drake automatically knows from the graph which targets to run in parallel and which need to wait for dependencies. So you don't need to think about how to parallelize your code, you can just focus on the content of the methodology. Afterwards all the targets are in storage. There's a special key value store in the .drake folder and the project root and Drake has special functions to retrieve the data. And Drake abstracts these artifacts as ordinary R objects. So you don't need to worry about how to store files. Drake takes care of the file management minutia for you. So here we have the first round of operating characteristics and we have a strong scenario at the top that assumes the drug cuts hospitalization time in half and we have a null scenario that assumes no efficacy at all. So we meet the efficacy criterion in the former but not the latter which aligns with our prior expectations. And it's a sign that the code is working but it's not useful yet because it only states the obvious. So we need to add more scenarios to understand the behavior of this trial. So in practice we reach out cross-functionally we comb the literature of the disease state to come up with meaningful scenarios. So maybe we set an additional effect size scenario right at the effect of interest, the efficacy rule or at standard of care depends on the situation. So in any case we add a new scenario in this case by going to the plan and proposing a new effect size. And right away Drake understands that we've added more targets and the previous ones are still up to date. That's what this graph shows you. So when we run our make again only the new scenarios actually get computed. Drake skips the rest and saves us a whole lot of runtime. That's 2,000 Bayesian models we don't need to fit. This behavior, skipping steps that are already up to date really helps my team and I especially during the high pressure, exciting, fast paced COVID-19 work we've been doing. Sometimes we need updates to our simulation studies with turnaround times of an hour or two hours. And we just don't have time to rerun all those previous analyses but we need a reproducible end product because this is serious clinical research and it's going to affect the lives of patients. So we need to move fast and we need to make sure that we're doing things correctly and Drake allows us to do both. Now our final results automatically updated with the results we have so far and that new scenario is now in the middle. Now I didn't show this but the dependency graph also takes into account the functions mentioned in the comments. So if I were to change a function that would automatically invalidate the results downstream and the targets would rerun. But in any case at the end of the day Drake can tell you if all your targets are up to date and this is tangible evidence that your output matches the code and data it's supposed to come from. It's evidence that somebody else running the same code would get the same results. And that is a huge piece of reproducibility. So you can learn more about Drake in the online manual, the reference website, public examples and the online workshop. The online workshop I've taught it at other conferences but it's also something you can run on your own in the cloud just sign up for our studio cloud and go into a web browser and you have everything you need to get started. I owe many thanks to the R community especially R OpenSci for the vibrant discussion and widespread adoption that made Drake the package it is today. And so many people have contributed insightful ideas and informed me about problems I didn't know existed and this active participation was incredible fuel for development over the past four years. Drake is a peer reviewed R OpenSci package and if you would like to share your use case consider reaching out at ropensci.org slash use cases. So we have a few minutes for questions. Will has been answering a few of them in the chat but there's a few other ones that we will that have been elevated. So Will are there plans to make migration to Drake easier migrating from a large R script to Drake's format one expression per parameter can be a source of friction. That's a great question. And I get asked for that quite a bit actually. So Ellis Hughes helped me out with a feature a while back called I believe the function is code to function in Drake and it helps convert a single script into a function and that in a way that is compatible with Drake in a sense that you can insert a script into a Drake plan as a target. There is a chapter in the online manual about that. If everything is in a script then and it doesn't make sense to include that as an entire target on its own there's actually it may take some manual disentangling and there's actually a package that was recently reviewed and onboarded to R OpenSci called R Clean that helps inspect and disentangle different interconnected parts of that script to either break it apart into different scripts or into different functions. And so we have time for one more question and then we're gonna move to the next session. How does Drake get along with Shiny? It depends on what you want to do what purpose Drake can has in that interaction. So most commonly I would say is the situation where you have a target at the end of a Drake plan that deploys some pre-computed work as a Shiny app. So maybe you have a Drake plan to do this long computation to build a final data set and you ship that data set along with the Shiny app to shinyapps.io or RStudio Connect. My team and I do this with slides quite a bit and there are some use cases with Shiny. I think that's the most common one. There are other kinds of interaction but it really depends on what you want to do with Shiny in this case. Great, thank you Will and we're gonna move on to our next speaker. Very exciting.