 All workflow systems have trade-offs. R Markdown is easy to use, but it struggles to handle a lot of code and a lot of runtime. On the other hand, the targets package handles a ton of work, but on its own, it's a bit pedantic for routine data analysis. This talk is the debut of a new system called Target Markdown, which has all the convenience of R Markdown and all the power of targets. I will also discuss stand targets and extension to targets for Bayesian statistics. Working together, these tools tackle daunting tasks that come up a lot in the life sciences. Machine learning, Markov Chain Monte Carlo, simulation, prediction, genomics, PKPD, and database queries are just some examples. Part of what makes these tasks so daunting is that number one, the code is usually slow. And number two, you're never really done running it. There are always bugs to fix, follow-up questions to answer, new data and all sorts of other reasons to rerun all that slow code. It's all too easy to get stuck in a Sisyphean loop where you spend a lot of time waiting for jobs to finish and you struggle to get hold of a complete set of results that are current and up to date. Even a single minute of delay between updates is long enough to feel because it's compounded by every single update you make. To get out of the death loop, we have to break it down and think of a data analysis workflow as a pipeline. A pipeline is a collection of interconnected steps or targets with clearly identified inputs and outputs. If you change a target in that pipeline, that change invalidates everything downstream that depends on it. So if the model changes, so does the post-processing and visualizations. Those downstream steps need to rerun to get the latest results. But on the other hand, there's no reason to waste time recomputing the data or pre-processing that comes before the model because that didn't change. So in real life, which targets can we skip to save time and which targets really need to rerun? Unfortunately, no human can reliably answer that question, especially if this is a project you haven't touched in several months. To ensure the results are correct while saving as much time as possible, we need automated tools to make objective decisions about what to run and what to skip. This is exactly the job of a make-like pipeline tool. By identifying the inputs and outputs of each target, a make-like pipeline tool arranges the targets in a directed acyclic graph like the one shown here, and it runs the correct targets in the correct order. And by analyzing the graph, it even detects opportunities to use parallel computing or distributed computing to run multiple targets simultaneously. And of course, it automatically skips any targets whose code or upstream dependencies have not changed. Not only does this let you adapt to changes quickly, it also gives you tangible evidence of the status of the results. If the pipeline tool tells you everything is up to date, that's telling you that someone else could run your code from scratch and get the same results as you. That is a definition of reproducibility. There are hundreds of pipeline tools for other languages, but historically not a whole lot for AR. The targets package, which builds on its predecessors, Drake and Remake, is designed to work seamlessly within AR itself. It encourages good programming practices, it abstracts files as variables, it natively integrates with AR Markdown as you'll see soon. So the targets package lets you work more naturally in AR than a language agnostic pipeline tool would. But there are drawbacks. If you work with targets on its own, constructing a pipeline requires you to have a clear idea of the exact inputs and the exact outputs of each target. You don't have to explicitly declare them like you would with other tools because targets automatically analyzes your code to detect them, but you still have to be disciplined about the way you structure your AR code. And that means writing pure functions to produce data sets, run models on those data sets and summarize those models. That's a lot of software engineering to ask of a statistician or data analyst. But today I will share two breakthroughs that address this usability problem and democratize pipelines and demonstrate their usefulness with an example in Bayesian statistics for clinical trials. The first breakthrough is the R-targetopia, an emerging collection of packages like STAN targets that produce ready-made pipelines for specialized situations. These packages already have functions and targets built in, so the user does not have to write nearly as much code or think as hard about engineering the pipeline at a low level. The mechanism behind these packages is a domain-specific pattern called the target factory. A target factory is a function, usually in a package, that produces one or more target objects. A target object is just the definition of a target represented in a special instance of an S3 class. For the user, target factories really simplify pipeline construction. In this example, you feel like you're only writing a single target, but you actually get three. The details of constructing those targets and connecting them together are all abstracted away. The STAN targets package leverages this idea for Bayesian data analysis with STAN. It has target factories from single-run workflows to large-scale simulation studies. If you use R and you use STAN, STAN targets is worth trying out. A bit of background. STAN is a probabilistic programming language for all kinds of statistical modeling. It's most famous for Hamiltonian Monte Carlo and the no U-turn sampler to fit Bayesian models, but it also supports variational inference and optimization. It can be really fast, but as with anything Bayesian, it inevitably comes with a non-trivial computational burden, which is exactly where targets and STAN targets can help. I'm going to show you how this works using a Bayesian longitudinal linear model. What you see here is like an MMRM, but without random effects. I don't claim it's the best model, but it is extremely popular in clinical trial data analysis. Most of the time, you see it with an inverse-wisuit prior if the covariance is unstructured, and this induces problematic associations among variances and correlations, which motivates this model you see here. A solution is to model the variances and the correlations separately and to use an LKJ prior or something similar on the correlation matrix. It was straightforward to write this particular model in STAN because STAN is so flexible and because its algorithms are completely indifferent to conditional conjugacy. Before using this model in the top-line analysis of a clinical trial, though, we need to validate it. That includes checking for all sorts of issues, one of which is the correctness of the implementation. In other words, does the STAN code accurately follow the model specification? One way to do that is with calibration. The idea is to simulate thousands of datasets from the prior predictive distribution, analyze each dataset with the model, and find out how well the posterior parameter samples agree with the parameter samples from the prior. So if we compute a 50% posterior interval for each simulation, then 50% of those intervals should contain the corresponding prior draws of that parameter. Likewise, for 95% intervals or whatever credible level you choose, if coverage is nominal, that's evidence that the analysis model and the data-generating model agree. In practice, I found this exercise to uncover a lot of bugs, and I will walk through it in a moment. When I do, I will use Target Markdown, a brand new system that combines the best of targets with the best of our markdown. We want the power of STAN targets to run a huge simulation pipeline, and we also want everything to live inside our markdown documents because it's convenient and because we can explain the details of the methodology right next to the actual code that runs it. There are two ways to use Target Markdown. There's an interactive mode for testing and prototyping, and there's a non-interactive mode for pipeline construction. What this looks like is you'll use our markdown pretty much like you would in other situations, but you have a specialized Target's language engine that creates a pipeline behind the scenes one code chunk at a time. This works whether you have a single our markdown report or multiple reports like in a bookdown project. If you have the latest version of targets, you can get an example Target markdown document either through the RStudio template system or through the use targets function. So we're inside in our markdown report. And to study the calibration of our Bayesian model clinical trial data, we want to define a function to simulate data from the prior predictive distribution. Thanks to STAN targets, this is the only user-defined function that the pipeline needs to write by hand. To make this function and other global objects and options and settings available to the pipeline, we use the target's language engine instead of the R language engine, and we set the target global's chunk option equal to true. If you're in the notebook interface and working interactively, you can click the green play button here on the right and Target markdown will just run the code and assign it to the environment of the pipeline like it says here at the bottom. You can do this if you want to test and prototype the function locally, and if you have multiple objects and functions, you can divide them up however you want among multiple global's code chunks like this one. Interactive mode for a target or list of targets is similar. Target markdown will resolve the directed acyclic graph, run the correct targets in the correct order out of the ones in the chunk, test that the targets can be stored and retrieved properly, and assign them to memory as variables in your R session. So interactive mode runs code in your environment and saves nothing to persistent storage. It's for testing and prototyping only, and the results go away when you restart your R session. But non-interactive mode, which runs when you knit the entire document, is the opposite. Non-interactive mode does not actually execute the code in the chunk, instead it saves that code to a script file to define part of the pipeline. That goes for functions like in the previous slide, as well as targets and target factories like the one you see here. The idea is to incrementally define a pipeline now using target markdown code chunk one after another and then do a serious run of the pipeline outside target markdown later on. In this particular chunk, we invoke a target factory called tarstanmcmcrep summary to define the bulk of the work in the simulation study, draw up prior predictive data, run the model, and compute summary statistics and convergence diagnostics. You don't need to worry about what R script file to put this in, or how targets and functions are organized within the scripts. That's all software engineering, but thanks to target markdown, all you need to focus on is the actual chunks inside the report. For even more smoothness, you can even turn arbitrary code chunks into targets with the tar simple chunk option. Then the code chunk code becomes the command, and the chunk label becomes the target name. This works as long as the code chunk acts like a pure function, meaning it returns a single value and does not cause any side effects. So here we have one target to summarize convergence diagnostics, and we have another target to calculate those coverage metrics for calibration. At this point, our entire pipeline is defined. When we run this report from end to end, the script files get written, and this report returns quickly. We then have the option of running the pipeline using the tarmake function or similar, either inside or outside this report. In this example, we run the pipeline in the same report that defines the pipeline. We don't have to, but it's very often convenient. We write an ordinary rcode chunk right after all the target code chunks, so we're not using the target's language engine anymore, and we call the tarmakeClusterMQ function to distribute our simulations across 100 workers on a computing cluster. It's also nice to have rcode chunks to read results from the target's data store and display them after the pipeline is finished running. The first time I ran this pipeline, it took almost 7 hours to finish, despite the heavy duty distributed computing. That's how large Bayesian computation can get sometimes. But the second time around, all the results were already up to date. Thanks to targets, it only took a few seconds for the whole report to re-render. So target markdown is kind of like the caching system in Knitter, but taking it to the next level. And finally, in the rendered report, we look at the results. Coverage diagnostics look pretty good. There is only one simulation with any potential scale reduction factors above 1.01. And coverage looks nominal. On average, 50% of the 50% posterior intervals covered the true parameters, and likewise around 95% of the 95% posterior intervals covered the truth, which is evidence of pretty good calibration. So we have a complete, well-documented story wrapped up in an R Markdown document backed by a powerful target's pipeline. These slides are publicly available, and so is the fully rendered version of the target markdown report I showed today. In these links, you can also find the source code of those materials and the various packages I mentioned. I would like to thank everyone who helped out with targets and its ecosystem, especially the folks listed here who helped make stand targets and target markdown possible. Thanks also to the R Medicine Program and Committee for allowing me to speak, and thank you to everyone for listening.