 Michael Kane, our next speaker, who is from Yale University, where is an assistant professor of biostatistics and past winner of the ASA Chambers Award. His research focuses on computing. Can we restart that intro? I think you started a little too soon. Okay. Thanks. We are now back live in our first session of our Medicine 2020. I'm Peter Higgins. I'm one of your co-moderators for this session. I want to introduce Michael Kane, who is an assistant professor of biostatistics at Yale University and a past winner of the ASA Chambers Award. His research focuses on computing, latent space approaches, statistical learning problems, and reproducibility. He applies the methods to understanding human mobility and patient heterogeneity in clinical trials. I have to mention that Michael is one of the co-founders of our Medicine and chaired the organizing committee for the first two years. Today, he will be giving his first our Medicine talk, believe it or not, reporting clinical trial data and analyses with the list down package. Thank you. Thanks, Peter. So I'm going to be talking about a package that I've been using for a little while now that lets you programmatically generate our markdown documents. I'm going to talk about why you'd actually want to do that and what are the cases where it's an appropriate decision. And then also, I'll talk about the use case where I've been using a package a lot, which is mostly in reporting for clinical trials. So the assumptions that I'm going to make for this talk are that you know our markdown. So you understand the structure of an our markdown document. You can create one. You've built one and shown them to your friends. You can build the documents. And then you're also familiar with clinical trial reporting. So that means you know what a clinical trial is. So you're at least aware of the idea that people construct these clinical trials to measure the efficacy of some kind of treatment or therapy, usually mostly for humans. And you know that basically that reporting analyses for one of these clinical trials is how you assess the efficacy of the trial. And that's basically that's part of the regulatory process. I'm also going to assume that you're interested in producing either trial reports or your own reports. And specifically for this talk and for the list down package that you know that you create a lot of tables or visualizations. You probably work collaboratively. I tend to work with clinicians and trialists in a couple of different disease areas. And then you've used our markdown to do this, but you may be running into scalability issues with writing your own our markdown files. And by scalability, I mostly mean that you're creating a lot of different tables and other visualizations. And you're running into time constraints with how many of these tables you want to generate versus how many you have time to generate. So the first thing I'm going to do is just start with kind of an example of a section of an R markdown document. So in this case, I'm using the dplyr, the survival package, which does survival analysis, and then also serve minor, which provides visualization of Kaplan-Meier curves for survival analysis. I'm also going to use the lung data set, which is inside the survival package. It's a pretty classic data set that shows, you know, for conveying, you know, kind of simpler examples of survival. So what I'm going to do is load the lung data set, and then I'm going to make a small change to it. I'm going to basically, one of the variables in the lung data set is sex, and it's numerically encoded. And since we are using R and we have proper factors, I'm actually going to encode those as male and female rather than one and two. After that, I'm going to create a survival analysis. So basically, so we're going to regress sex onto the survival event of time and status. And then, you know, and then I may want to visualize and see what that looks like. So the idea is not to show you how good I am at using survival plot. I basically want to use this as kind of a canonical example that you can understand pretty quickly. And this kind of looks like you can see how this relates to the R markdown documents that you're probably making. So if you are making one of these, you have the R chunk, so the R code, and then you usually have some text around it. So, you know, so I may have text where I want to actually describe this serve plot, and I might want to say the figure above indicates that women in the study tended to live longer in the first 750 days of the trial when compared to men. And again, I'm not as interested in this as an actual analysis. I'm mostly interested in thinking about this document and its parts. So if I wanted to break this down, I'm going to basically dichotomize this document into two parts. The first is the computational component. And these are the pieces of the document that are derived from computation. So the R chunk is a computational component. It shows a series of things that you want to do in R. And the idea is to create something that will be presented. In this case, it's a survival plot. So it can be a plot. It can be a table. It could be an interactive graphic or something like that. The other part is the narrative component. And the narrative component is basically the prose that's associated, the prose in the document. A lot of times it's kind of giving context to the computational component and then also providing a theme and conveying something conceptually. So the idea is that the computational components are the objects to be, are going to provide something that gets presented. So my R code created a visualization. And again, when we're thinking about those, we're usually thinking about plots or tables. The narrative components are, for a lot of documents, are super important because they contextualize the presentation that's associated with the computational components. They provide the background. They define the goals. They establish themes and they can convey the results. So the narrative components are basically how we think about, how we understand the presentations derived from the computational components and then also the extra information that's conveyed. So the integration of these two kind of defines the literate programming. And the idea with literate programming is that we want to provide interpretability for the things that are being conveyed. And then we also want to facilitate reproducibility, which the R Markdown documents are really good at doing. So the list down package really starts with the observation that since computational components are by definition computationally derived objects and R is a well-defined standard, it's possible to programmatically create R Markdown documents with computational components. So the idea is that I can write an R program that's going to create an R Markdown document. I'm not going to be able to fill in the narrative components because that's something that's not easily done programmatically. But I can put in these computational components and I can arrange the presentation and I can build these documents. And because I'm automating this process, I can think about doing this for a lot of different visualizations and a lot of different presentation types. All right. So how do we actually do this? And why did I call this? Or why is this called list down? So the idea is that a list in R can do two things. So basically it can hold these objects that we want to present. And then a list can also provide a hierarchy. And we can map that hierarchy onto the sections and subsections and subsections in an R Markdown document. So in this example, I'm creating a variable call. So first I'm going to load the GT summary. So basically in this case, I'm going to be building a table. And I'm assuming that I've already loaded the survival package. And I can visualize the survival analyses. So what I can do is I can create a list called comp underscore comp for computational component. At the highest level, the list is named summary. And I'm going to add this dot tab set. And I'll show why in a sec. That list is going to hold another list where another named list where the first element is called table one. And it holds this table summary. The second element is called KM plots, which holds another list, another named list where the objects are called overall, which an overall is going to hold the plot, the Kaplan-Meier plot for the overall population in the lung data set. And the second one is going to call is called bisects. So it's showing another Kaplan-Meier plot, but the Kaplan-Meier plots are bisects again in the lung data set. And then after that, I might just want to show, I might want to think about the entire lung data set. So the idea is that the list elements are holding the things that I may want to convey, and they're also holding structure. So from this, it should seem pretty reasonable that I should be able to make an R markdown document of chunks, not necessarily text. So again, the list names can be thought of titles or sections or subsections. List elements contain the objects we want to present, and the hierarchy defines the sections and subsections. So if I actually want to see what the hierarchy looks like, the first, there's a function in list down called LDCC-Dendro. So that basically I'm going to create a dendrogram from the computational components. So if I call LDCC-Dendro, I can see that I have, that I'm looking at the computational components object, and then I can see the hierarchy of the names, which again, we can think of as corresponding to sections, and then also the objects that are being held. So I'm not actually going to display the objects or anything like that. I'm just going to tell you what they are. So you can see table one is a table summary, and it's also a GT summary. Overall is a GGServe plot. It's a GGServe, and then it's also a list. And data, which is my original data set, is just a data frame. So if I actually want to create a document from this, the thing I'm going to do, I need to do a couple of things. So first, I need to tell the document where the object is. So where are the computational components that I'm going to create the document from? And to do this, I'm going to save the computational components using RDS. So after that, then I'm going to create a list down object. So the list down object is going to tell our, how to actually create the documents. The first argument to list down is this load CC expression. So this is load the computational components expression. So I saved the computational components as an RDS file. So the expression that's going to load them is just an read RDS CC dot RDS. Next argument is package. So this is what are the packages that are going to be needed to actually create the derived document from our markdown. We have, in this case, it's going to be the survival package and the sort of minor. After that, the rest of these arguments are going to be the R chunk options. So I'm saying, don't echo the R chunks. Don't tell me if there are warnings, and don't tell me if there are messages. So essentially, I want to just show the output or the presentation of the computational components without showing the underlying R code that's going to be creating it. And then what I'm going to do is I'm going to call a function called LD make chunks. So LD make chunks is going to take the LD object, which knows both where the data are and how to actually create the document. And it's going to output a character vector or each element of the character vector corresponds to a line in the output document. So if I look at what that actually looks like, again, this is only the first 15 lines, but I can see I have an R markdown document. So I can see the first thing I do is I have a chunk. The options that I specified in the list down function are there in each one of the chunks. And all the first chunk does is it loads the libraries that I want to use. And then also reads in the data set. So by default or yeah, by convention, the data being read in, that's again the cc.rds, is going to be held in a variable called cclist. And after that, I'm going to be creating sections and then our chunks. And all the R chunk is going to do is call the appropriate element of the cclist. So for example, my original comp object started with a list, a named list where at the top level it was called lung summary along with this tab set thing that I'll explain in a sec. The first element of that was another named list called table one. And then after that comes a computational component. This is something that I want to create a presentation from. And all the R chunk is actually doing is it's calling the appropriate element of cclist. So this shows me that I can go from the list down object and the computational components to R chunks. If I actually want to generate an R markdown document programmatically, then what I can do is I can say as character, then my document basically needs to start out with a header. So there are a couple of functions that facilitate R markdown header creation. In this case, I'm just going to make a vanilla HTML header. But there are functions to do things like to create workflow R headers or a couple of other headers. If you want to create your own header, this just uses the YAML package. So you can create your own header using YAML and then output it to the document string. But I'm just going to create this vanilla HTML header. And then after that, I'm going to add chunks. And that's going to be my document. After that, I can write the document to an R markdown library. In this case, it's called lungsummary.rmd. And after that, I can render it to actually create it. So here's the actual document. It is embedded. And because I had specified the tab set, I have a set of tabs. So the document is called the output document. After that, there's lung summary. Here's table one, KM plots and data. These were the names of the sublists I had. I used GT summary to create table one. And you can see this is the summary created by table one by, in this case, it's the institution, then the age, and then the sex. I can go to the Kaplan-Meier plots. Again, here's overall. This was the Kaplan-Meier plot for the entire population. And then here it is by sex. After that, I have the data that I've put into a data table object. And I can actually go then through the entire data set and look at the entries one by one. There are a couple of other features that are included. You can add code at the beginning of the document. So Listdown has an argument called an expression. So if you need to do things like customize the functions that you're using for presentation, those functions are called decorators. And you can specify those in Listdown. You can use an initial expression to do kind of whatever initialization you need for functions. If you're leveraging a lot of code that you've written in R, it's not suggested that you use the initial expression option. You probably want to just put that into an R file and in the initial expression call source. But these are things that are executed immediately after loading the libraries. After that, you can control the chunk options in a couple of different ways. So if you want to, for the entire document, I already showed how you can use the dot, dot, dot argument. You can control chunk options for object types. This is using this decorator chunk option. So this would make sure that a given type of R object is presented in a uniform way across the entire document. And then if you really need to, you can specify chunks for an individual computational component by using the LDChunkOps function. And this essentially adds attributes to the computational components. This is kind of the, the last one is a little bit discouraged, though, because it's the least general. In terms of the priority, though, the lowest priority is given to controlling chunks for the entire document. Then object type is a higher priority. And then the attribute types get the highest priority. So, so this shows how to actually create one of these, one of these documents. So even, yeah, so I guess one thing to think about is if you can, if you can create, programmatically create an R Markdown document, is it a good idea? And the answer is that most of the time it's probably not. If you're going to think about doing it, you need to think about where a context that's very well-defined. So when I'm doing this, I usually have a lot of visualizations that I need to create, visualizations and tables. I'm usually working with someone who understands the data, at least as well as me, and that tends to be with clinicians. And the idea for us is basically to generate a lot of different documents that we can restructure very quickly. And then at the same time, our job is mostly to build a narrative and make sure that what we're seeing in the data corresponds to the clinician's understanding of what's going on in those data. So if you're interested in using Listdown, it has been used kind of in practice. It's pretty mature at this point. And I'm thinking that it has a pretty mature and stable interface. It's available on GitHub and on CRAN right now. Thanks very much. Okay. Well, that generated a number of questions. Michael, one came up and has been upvoted. Is it possible to specify different parameters for each chunk, for example, cache equals true for some cache chunks and false for other chunks? Yes. So you can do that with chunk options. So essentially what chunk options is doing is attaching attributes to the computational component list elements. So what Listdown will do is it goes through and looks at what are, for a given type, what does it think it should be doing? How should it be presented? And then are there attributes associated with it that tell it to actually change the chunk options? So, yes. So that option is available. Another question from Eva Redamont, got two votes, asking if you can auto render to PDF? Yep. So as long as that PDF information is, you can either put it in the header or you can specify that using the render function. And in either of those cases, you can create PDF or you can retarget to HTML or even Word. Okay. And would TinyTax be a reasonable dependency for going to PDF? That sounds right. I think that's right. And that's a dependency on R Markdown, not on Listdown directly. All Listdown is really doing is writing the R Markdown files. And then R Markdown, the package can render however it does that. Jeremy Selvask, can you add text to explain the table or plot using Listdown? Yeah, that's a really good question. And so you can always just generate the R Markdown file if it has a lot of visualizations. And then you can, it's just an R Markdown file, right? So then the output of that, you could add text to and then render. If you really want to, you could add those texts, the text as a computational component basically by making them a string that are held by the list. And then for the R chunks, you can use results as is. And then it would put that text in line. We'd started talking about what if you want to iterate between adding narrative components but keeping track of them and going back and forth. And I think that's kind of, if there's a lot of interest in that, we might pursue it for the purposes that I've been using it so far. I haven't really needed to do that. Okay. Well, we're going to wrap up. Thank you very much, Michael. And I think Beth is going to introduce the next speaker.