 And now I'm very, very happy to introduce Daniel Silbert and Carissa Whiting who are going to be talking to us about their awesome GT summary package. Take it away guys. Oh, never mind. This is going to be a pre-recorded talk. So they're going to be in the background listening to your questions and I'm going to be playing the talk. Hello everyone. My name is Daniel Silbert and I'm here with my friend and colleague Carissa Whiting and we're going to be talking about the GT summary package and how it can help you improve your reproducibility of your reports. So we're going to just want to talk a little bit about the reproducibility problem in medical research. It's been quite a bit over the last few decades and there have been some incredible tools that have been introduced to improve the reproducibility of our work. For me, one of the big game-changing developments has been the R Markdown document. Being able to integrate my code along with the report text has been incredibly valuable. There's always been this gap that I have never been able to bridge. And that is after I create my Markdown reports, many of the tables that I've made, I still need to do a bunch of tweaking by hand. And what we wanted to do is make it possible to have a 100% reproducible workflow using our Markdown that required no tweaking, which led us to create this package GT summary. The goal of GT summary was to, one, make this syntax incredibly simple. Two, to make the output look exactly like it's ready for publication, no tweaking necessary. And three, to make it very easy to customize your output. There are several types of summary tables that are available within GT summary. The first we're going to talk about are table one types now in medical research. When you look at the journals, the most frequent table one is a description of your cohort to want to summarize regression models, create cross-tabulations, and summarize survival data. Some important things that we wanted to do was to make these resulting tables very easy to work with. It's not infrequent that I need to regression models side by side, so it's easy to merge any two GT summary tables. One other gap that has been bridged by the GT summary package is that once you've created your beautiful tables, when you're writing your report, if you're talking about the odds ratios that you have estimated in your model that you've now beautifully summarized with GT summary, you need to be able to easily grab those odds ratios with the confidence interval and the p-value and slip them straight into your report and with the inline text function, that is possible. Now talking about the flexibility, we also introduced themes. Now themes are something that you can run once at the top of your script and it will change the defaults for all of your GT summary tables. You can set a theme that says, oh, I'm going to be publishing this in JAMA. Let me use the JAMA theme and now all of your results will follow the JAMA reporting guidelines. TBL summary is our primary function that summarizes data sets. This is the data set we're going to be using as our example. It's called trial but I'm making a small version of it, so that's SM trial. It only has four columns and each of these columns has been labeled using the label package. Let's just get started with the most basic TBL summary. We are taking our data set SM trial and we're passing it to the TBL summary function and we're using the argument ITRT. What that's going to do is that's going to create a table for you that's separated by treatment, TRT. What it does is it takes every column in your data set and it adds it to the summary table. You can see here there are three types of variables here. We have a continuous variable that's age, categorical, that's great, and dichotomous. That's tumor response. When you pass a data frame to TBL summary, it's going to do a lot of inspection of your data to find out what the best default should be. It's looking at that column age saying, there's a lot of levels, there's a lot of spread here. That's definitely a continuous variable. It's going to default to presenting the median in the IQR for you. It's going to next look at grades, say, okay, three unique levels in its character. Definitely that is a categorical variable. Let me present that that way. Lastly, dichotomous variables. What it's going to do, it's going to look for variables that are coded as 0, 1, true, false, yes and no. It's going to automatically assume that that dichotomous line. We're only going to present the yes or true or 1 or row. This is tumor responses coded as 0, 1. You're seeing the proportion of tumors that responded on a single line. You'll also see here that for age, we are showing the results to the nearest integer. When TVL summary sees a continuous variable, it also inspects the spread of the data to say, oh, I think that this should be rounded to the nearest integer or to the nearest decimal place or two decimal places. It makes that assertion based off of the spread of your data. Let's, in this example, change it up again. Rather than reporting the median in IQR for age, I'm going to report the mean and the standard deviation in parentheses. What I'm doing here is using the statistic argument, and I'm selecting all continuous variables, and I'm telling it that I want the mean and the standard deviation. This string that I'm passing up right here, it has mean in curly brackets, then it has a set of parentheses, and then in curly brackets, again, the standard deviation. Anything in the curly brackets is going to be evaluated, just like it would be in a glue syntax type, where it's taking what's in the curly brackets, and it's going to evaluate it. Now, in this case, it's doing two things. It's looking for the mean function and saying, oh, well, I'm going to take that vector of age or that vector of continuous variable, and I'm going to execute the mean function on it, and then also the SD function later. And I'm going to put the results in the format indicated in this string. So this is a very concise way to report these things. So here's a quick schematic that shows what parts of those tables, each of those arguments are modifying. So here in orange, the statistic argument changed the age from mean and standard deviation. The digits argument changed from the default zero decimal places for age to two decimal places. Below that in green, statistic argument added the denominator for grade. The label argument can change the variable labels and the missing text can change the text that's shown for missing variables. Throughout the GT summary package, we use a special formula notation for selecting variables and telling it what we want those variables, how those variables behave in our resulting tables. So here's a quick example of how you can select your variables. You can use the bare name age, you can put the names in a vector, so you can use high select, starts with, and there are a couple of internal functions within GT summary called all continuous, all categorical, and all dichotomously. So let's just start at the top. This is the simplest one. You can just say age on the left hand side of that formula, tilde, and on the right hand side you say, okay, for the column age, I want the label to be patient age. Moving on to type, you can select multiple columns by putting them in a vector. Age and marker, I want those to be displayed as continuous variables, which is the default in this case, but just for illustration, we have it here. For the digits, I'm saying anything that starts with age, now that's any tidy select function can be used here, which is fantastic. In this case, there's only one variable that starts with age, and it is the variable age, and I'm saying here to display it to the nearest integer or to zero digits. And for the statistic, I can change the statistic presented for all continuous variables at the same time, with a single line saying all continuous, tilde, on the right side, mean it's inner deviation. Now for the statistic argument, it's finding that function mean like I mentioned earlier, but I want to also say that any function can be used from any package as well. So it's quite flexible and really concrete summaries for anything that you may need. Now if you needed to pass two or more sets of instructions, you're just going to pop each of those formulas into a list. So in addition to the arguments of TBL summary, there are a bunch of helper functions that you can use as well. Now how you use these is you take the resulting TBL summary object and you just pipe it right into one of these functions. So there's this add family. Now add an additional column of statistics or information to your table. So the most common one I use is called add p, and it adds p values to your table to compare two or more treatment groups. Now let's review an example using the add p and a few of these other functions. Here's a schematic showing how some of those functions can be used. It does not include all of them, but you can see here in blue at the top we've used the modified spending header function to add a header over the treatment, a and b. And we just say should we receive, drug a, drug b. It's quite informative. So over here in green on the right has been used to add a p value column. At the bottom in team, modified footnotes, then used to update the standard footnote that is shown. Talking orange, just above that we've added an additional column of overall statistics. To the left of that in yellow, add n. It's been used to add the number of non-missing observations for each one of those variables. Just to the left of that, bold labels in purple, holding the labels for age and grade. And just above that Modify header in turquoise has changed the default header of two variables from the default characteristic. And there's a code example of this depending on the slides if you're interested in taking it away. Mike, do you have the time to talk to Kuretsi who's going to talk to us about summarizing models with the TBL regression function. Hello, I'm Karissa Whiting and I'm going to talk a little bit more about table regression which is a function in the GT summary package that makes it very easy to display nicely formatted regression model results with any of the lines. Show what table regression can do. I'm going to use this logistic regression model example using the trial data set that Dan introduced earlier in the talk. But before that I just want to mention under the hood table regression is using the Broom package to do some initial tidying of the model outputs. And then it's allowing some powerful customizations on top of that. So if a model method already has a Broom tidier available, which many of them do, it's most likely going to be compatible with the table regression function. So for example, LM or GLM, STATS package or COXPH for the survival package, LNER from the LME4 package, these are all going to be compatible as well as many more. So in this example, we're going to use tumor responses as our outcome which is a binary outcome and we're interested in using age and tumor stage as covariates of interest. So if we're building and presenting this model, we're probably interested in displaying what the odds ratio estimates are in their confidence intervals. We're probably interested in showing the p-values for the covariates and we want to make clear what the reference levels are for the categorical variables. So here on the right is just the very raw output of the model object. And then here's the basic table regression code and the very basic table output on the right. Here we're just going to be passing the model object to the table regression function and we see already that the reference rows are created for the categorical variables and it's very clear which one is the reference level. Variable labels are displayed, so similar as with the table summary function if your data set is labeled, those labels will be carried through and displayed at the table. Additionally, because we specified this exponentiate equals true argument, our coefficients are exponentiated and the function's able to detect that these are odds ratios because it's a logistic model, so it's able to give that correct OR heading as well as this odds ratio of that. So next we might be interested in customizing these regression tables a little bit and one nice thing with this package is that the framework is pretty unified so a lot of those helper functions or arguments that Dan already talked about that were relevant to table summary are also relevant here to table regression. So for example, all of those bold functions like bold labels, bold p or italicized levels, all of these formatting functions can be used here as well. And also here we're specifying that we want p-values specified to two specific digits, so that's also available to both table summary and table regression as well as some other model objects we'll talk about later. But one new one I'll introduce you to here is add global p. So by default the function's going to calculate level specific p-values for categorical variables, but by passing our table to this add global p function it's going to use the ANOVA function from the car package to go ahead and calculate a global p-value. And for full transparency it'll print a note to your console specifying that that's the function that it needs to calculate that. So here's a schematic just to kind of review what we've discussed. So we use the bold labels and the italicized levels just to kind of format our table and make it look a little nicer. We use the p-value function to specify we wanted two specific digits. We have the global p, we experimented to get those odds ratios and then I use the full p function to specify I wanted any p-value under the threshold of 0.1 to be bold. So by default it's 0.5, but here it just changes to 0.1. In addition to the table regression function there's also table UV regression which allows you to very easily run univariate regressions for a set of select variables. So here in this example I'm taking the trial data set and I'm again selecting age, stage, and response and then I'm passing those to the table UV function specifying which of those is our outcome which is why it was response. I'm specifying what method to use for the univariate regression which is GLM and then I'm going to meet the any method arcs that I might need. So in this, for example, we express the families binomial and then again I can use a lot of the same arguments helper functions like exponentiate and the old functions that add global p function. So those can also be used here as table regression. So it's a bit of an aside from the regression functions. I just want to talk quickly about this really helpful function inline text. So this function can be used with any sort of GT summary object like a table summary a table regression or a table UV regression object and with this function you can pluck out any element or statistic from your table and report it inline in your R-marked-out report. So for example, here if I want to report the odds ratio for age I can just go ahead and specify this with this inline our po-chunk and then when I knit it it'll come out like this and it'll give me that really nicely formatted odds ratio and this is really useful for making your reports extremely reproducible because you can guarantee if any tables are changing or if your underlying data changes the text in your R-marked-out report will change as well. So I'm just very briefly going to talk about these two functions, table merge and table stack and these are just functions that allow you to combine results from any of the other functions we've previously discussed. So for example, we built a regression table and a UV regression table. If I use this table merge function I can go ahead and put them in one nice concise unified table to present the results all at once. Similarly table stack just allows you to stack tables on top of each other whether summary tables or regression tables, whatever they are you can combine them in these ways. So we've shown how to customize individual elements of tables using things like function arguments or helper functions but now I want to introduce you to themes which are a way you can package a lot of these different customizations together into bundles and then you can easily set and re-set these across your reports or even across your projects. So a theme is basically a fine set of customization preferences and again you can easily set and reuse these themes control a lot of the default settings for existing functions. So for example Dan talked about the statistic argument at table summary. If you always want to present for continuous variables I mean instead of a median you can do that you can do that as the defaults and set that within your theme. Additionally themes can control some more fine-grained customization options that may not be available via existing arguments or helper functions. So they really give you a lot of additional options in that way and we already have a couple available package themes I'm ready to use but it's also very easy to create your own if you have a very specific set of preferences you prefer. So just to go over a couple of the currently available themes we have a couple journal themes and these are great because they already have a lot of settings built in to give you the exact specific formatting you need to submit to journals. So right now we have a JAMA theme and a Lancet theme available and these do things like ground repeat values with the correct number of different digits and also format your statistics. So for example I know JAMA requires you have a dash between your inter-co-tell range statistics so this theme will go ahead and do that for you. Additionally we have several language themes available so I think we have about 12 languages currently supported and these will change the statistics and the footnotes to be to the language of your choice. So for both this and the journal theme we're definitely going to expand these to add more journals and more languages in so if you're interested in helping out with that please let us know we'd love to collaborate to make our library a theme see bigger. And then we have some formatting themes so the theme compact just reduces the padding font size of your tables and I use this one a lot just to make sure reports are really neat and really tight so this is just a useful aesthetically pleasing formatting theme. And as I mentioned before it's really easy to create your own. A theme is essentially just a named list of different theme elements and there's a glossary of available theme elements of the documentation. So here I just made a kind of crazy theme where I made my Leibold's rainbow. I put heart-sensitive values. I subbed my percentage signs and statistics to be smiley faces so this is just to show there's a lot of things you can do with these themes and a lot of options. So now I'm going to pass it back to Dan who's going to talk a little bit more in depth about how GCD summary tables work with our markdowns and some of our available print engines. We're going to finish by talking a little bit about our markdown output formats and print engines that are available within GT summary. Now obviously the packages called GT summary were optimized to use the GT package as your primary printer. With HTML output GT is a fantastic engine to select. But while PDF and RTF are under construction you may find yourself needing something else. For example I often use FlexTable when I need word output. To use any of these print engines you simply take any GT summary object and you pipe it straight into as GT or as FlexTable for example. And what will happen is that a series of formatting functions within each of those packages will be applied to your GT summary object. The result is that you will no longer have a GT summary object. You will have in the end a GT object that has been formatted or a FlexTable object that has been formatted. This allows you to have long flexibility in the types of outputs that you use with your markdown. If you already know a family of formatting functions and know it well it's easy to just stick with that in what you already know. Lastly just thank you all for taking the time to listen to us. I'll talk about our package GT summary and thank you to all the authors and the contributors and the many people who have tested the package over the last year. It's been a pleasure to be here with you all. Thank you. Great. Thank you all very much for the presentation. That was fantastic. There's quite a few questions. The first one that has the most responses is not a question. I just want to say thank you for this awesome package. The next runner up is change the format of the output presentation. For example I mean put mean, median and IQR in a different row for a continuous variable. To put them on separate actual rows in the table at the moment it's in an open pull request. There are a little bit of a workaround if you're using the FlexTable output. You can use that escape n to do a line break within a cell to get it on separate rows as well. Great. I think I can sneak in another question. In terms of adding statistics with GT summary, is it possible to do post hoc analysis? There is a very general function called add stat that you can add literally any statistic that you want to a tbl summary object. In that sense yes, you can add almost anything you want and we try to create functions to do the most common stuff. Looks like we're out of time and should probably move to the next session. Thank you guys very much for your talk. I think that was fantastic. Thank you.