 All right, well, I think that is enough time for everyone to join. So my name is Daniel Silberg. Welcome to the clinical reporting with GT Summary Workshop at R Medicine. So first I want to start by saying thanks to the R Medicine and all of the sponsors for letting us all be here together. I think this is one of the most amazing conferences being 100% virtual with really low registration costs and free workshops. It makes the materials accessible to people all over the world. And I think that is just fantastic. All of this work is licensed under the Creative Commons, meaning you feel free to use it. All right, if you want to know the details, click the link and you'll see how to use it. So like I said, I'm Daniel Silberg. I am a statistician at Memorial Sloan Kettering Cancer Center in New York City. And I have been here for over 10 years now. And here is my primary role is designing clinical trials, analyzing our hospital databases to see what treatments are working best for people. And in that role, I have done a ton, a ton, a ton of reporting. And I also love programming. So kind of brought those things together to bring us GT Summary. So our TAs for today are Shannon Pallegi and Carissa Whiting. So Carissa has been a colleague of mine for quite a few years as well. She's an author on GT Summary. And Shannon is more of a new colleague, but not directly she's related to the hospital, not directly related to what we do. But it's been super fun getting to know Shannon the last year or so. And here's their information. Connect with them on Twitter or GitHub. They always have exciting stuff going on. So just a quick checklist before we get started. I hope that you had a chance to install a recent version of R as well as RStudio and install the necessary packages that we're going to be using in our exercises today. And I hope that you are also able to knit our markup files. So pretty much this should work out of the box. But I know that some of us work in some pretty constrained environments with our workplaces. So hopefully everything's working well for you. If it's not, I would recommend maybe spinning up an RStudio cloud session. Usually those work pretty well and pretty seamlessly for me in the past. So as we progress through today's workshop, please add any questions to the public Zoom chat. Carissa and Shannon are going to be taking a look at that chat, and they will interject as needed. Ask questions for notifications for you. With that said, however, at the breaks and during the exercises, please feel free to unmute yourself and to ask questions. All right. I think we actually can get started now. I'm sure in science, you've heard about the reproducibility crisis. The arc of the reproducibility crisis is very long. There's a lot of places that things can go wrong. But let's focus on the bit that is our responsibility as statisticians, data analysts, data scientists, whatever. It's the quality of medical research is often low, sadly, is what it is. And a part of that is the poor code quality. I spent many years as an editor at European Neurology. And during my tenure there, we started asking authors to donate the code they used to create this page to write their manuscripts. And we asked them, are you willing to share your code yes or no? If no, is it because you've had no code or there's no code necessary? If yes, the viewers had to upload it. And we found quite some striking patterns in these responses. Number one, there were clearly many, many manuscripts who said that they didn't use any code, despite the fact that clearly they had to use copious amounts of code to do the analyses in their manuscripts, which was wild. And also the code that's submitted was often very low quality. It was a lot of repeated copy and pasting, which could lead to errors very easily, especially as you continue to update your data. So low quality code in medical research is just a part of the problem. And it's likely to contain errors. And we want everyone to elevate their work and do it in a really reproducible, high quality way. But I understand it can be quite cumbersome and time-consuming to do that. And that's really where GT summary came from. We wanted to be able to write super high quality, reproducible code, but make it easy and make it also very flexible. So let's just talk about a little bit, overview of what GT summary can do. So generally, GT is the GT package from our studio. It creates gorgeous tables and it outputs to HTML, PDF, RTF. And at the moment, Word is in development. So that is right around the corner for us. So we're going to be creating table one types. That's the table you see in medical journals describing your cohort. We're going to have cross tabulations, summaries of regression models, summaries of survival data, or time to event data, survey data. And we're going to show you a bit how to make custom tables. We want to show you how to report results from your analyses in line in our markdown documents. We're going to be stacking and merging and patching all, cobbling these tables together to make really complex tables, but from very simple individual parts. And we're going to talk about themes and also print engines near the end. But to get started, let's just talk about the data set that we're going to be using primarily through the examples of this workshop. So the data set ships with the GT summary package. So if you load GT summary, you have the trial data available to you. So this is a data set which has one line per patient who participated in this study. And all patients receive either drug A or drug B. And we have other various pieces of information about them, the age, some marker levels, their cancer stage, their cancer grade, whether or not they responded to treatment, and whether they subsequently died, and the time to that death that they did die, or time to censor if they did not pass. It's important here to note that all of these columns, TRT, age, marker, stage, they've all been assigned variable labels using the label package. So you can see here on the right that TRT has the label chemotherapy treatment. Age has the label age, but with a capital A, because it's nicer to look at. Same for grade and response has tumor response. So this is what we're working with. We're going to be working with a smaller version called small trial or SM underscore trial. And that is really just going to consist of treatment, age, grade, and response, just to keep it a little bit easier and keep our resulting tables a little bit smaller for this workshop. So that brings us to our very first exercise. I have a lot of eight minutes for this. But what we're going to do is you're going to go and head to this link. And this is also available on the workshop website. If you go to the Materials section and scroll to the bottom, there's a link to download a zip file right here. And what you're going to do is download that file. You're going to extract those contents into their own folder. And then you're going to click the RStudio project file. And you're going to open up that project in RStudio. And then what we're going to do is you're going to see some code. It's going to say exercise one at the top. And it's going to have some code that sets up the data that we're going to be working with. And we're going to label the data. So after you have applied some variable labels to this data, then we will move on. Let's start the countdown for eight minutes. And I will share my screen and do it in real time with you. And I will also walk you through a little bit about the data set that we're using. So the data set we are using here is called the National Health and Nutrition Examination Survey. If you know this data set well, I apologize in advance because I'm butchering a little bit just to make our examples simple. So what we're going to focus on is whether or not a participant in this survey quit smoking or did not quit smoking. And whether or not they experience a subsequent death I think within 20 years of that survey of whether or not they quit or not. And we also have some other variables in the data set. Age, sex, blood pressure, and diabetes, and the amount of the exercise. So what we're going to do is we're taking a look at this code. We're loading GT summary. We're loading the tidy verse. We'll talk more about this line, line 23, this compacting later. But you can see here I'm just getting this data set, NHEFS, from the causal data package. I'm selecting a few of the columns out. And I'm getting rid of missing observations just to keep our life easy for now, obviously in the real world you have some missing observations. And then we're going to re-label some of this data so that it turns out to be more beautiful in our table. So instead of having a 0-1 variable that you quit smoking or not, we're going to have a factor with labels did not quit or quit. And similarly for sex, we're going to have a character variable that is male or female, the amount you exercise. And then here at line 50, this is where we're going to add our variable labels, all right? So how is that coming along? Everyone able to download the zip file and get started? Take that as a resounding yes. So I'll give you guys some time to, or you all some time to work on this independently. And in about in a few minutes, I'll start working along with you. If you have been poking around the website already, you may have noticed that the solutions are linked to here as well. So if you do get stuck, please feel free to ask questions, review the solutions. If you just think you need like a little hint to just keep on moving, that's perfectly lovely. Do you have your exercise timer set for this one? I believe I do. Let's see. Oops, I do not. I have another timer on another screen that is going. So thank you very much, Adam. So I built these slides with Corto. It was my first time using Corto and they're beautiful and I love them and I think Corto is pretty awesome. But just so you all know, if you start the timer in presenter mode, it does not start the timer on the rest of the screens. Just a little tidbit, we can all learn together. All right, so the reason why we're applying these variable labels is because GT summary makes heavy use of them in the default recording. For example, when we're going to make summary tables rather than displaying like a raw variable name, it's going to check to see if you have applied a variable label and it's going to display that instead. I recommend that you label all of your data that are ready for analysis because utilizing these underlying variable labels just really eases the production of tables, even figures and you can get yourself looking good and ready for publication or sharing with whomever very quickly. So if you've never used the labeled function before, it's pretty great, the labeled package, excuse me, not the puncher. So you can easily just assign a few labels. So I'm going to start over here, labeling the depth variable. So we're saying the column label for depth is the person passed away, quitting smoking. This is where you quit smoking. Let's quickly grab the other ones while we're at it. See here, we've now taken this dataset that we are grabbing the raw data from the causal data package. We are doing some modifications and we are applying these column labels to this dataset and it's going to be gorgeous. So let's run this code, make sure everything works for us. And if you haven't heard of the skimmer package, skimmer is a wonderful, wonderful way to get a very quick summary of your entire dataset. So I'm just running skimmer, DFNF, N-H-E-F-S here and it's telling me lots of information. I have 1,548 observations. I have seven columns. I have one character column, two factors and four numeric. My dataset's not grouped, that's a good thing. We don't want to group right now. And then it gives you a summary of each variable. So here it's giving the summary of sex, which is our only character variable. There's zero missing for all of the variables you saw that we dropped all the NAs in the beginning. And it's giving us that quick, that lovely summary onto the factor variables, giving us a quick summary there. Like the numeric ones are really cute because you get like the quantiles there or the quartiles and also this lovely little histogram at the bottom as well. I was pretty late to start the timer. Can you do a quick raise hand when you are done with this part of the exercise, please? Growing through it looks like you have 10% effort distance with hand raised. I would give them a little bit more time. Okay, yeah, let's do a little bit more time. That's no problem at all. I know that we're all virtual, so maybe not everyone is actively doing exercises with us all. So I don't expect 100% hand raised at the moment. So the original timer is about to go off. So I'll use that one as our guide. So all right, first exercise of five exercises is now complete. That one was probably the least fun one because it's just on zipping files essentially. What's the big lift there? So let's get started, let's continue. All right, so let's talk about TBL summary. TBL summary is my favorite function in G2 summary. It's fantastic. It really makes it so, so simple to get a table that is ready for publication out the door, share with your collaborators into a journal. Then a few months later it's accepted and you are a published researcher. So let's just look at the quick example here. I'm just a side point. I'm using the base R pipe throughout this presentation. So this vertical line and the arrow that looks like this, that is the base R pipe. If you're not familiar with it, it's very similar to the Magrader pipe from the tidy verse. And essentially for everything we're going to see here, you can interpret it exactly the same way. You know, honestly, when I was making these slides, some of them are recycled from a previous presentation and I just did a find and replace. Magrader pipe for the base R pipe and everything worked out really well for it. So we're taking our small trial data set. We are piping that to the select statement, which is going to remove the treatment variable and we are piping that whole object into TBL summary. No arguments. This is a hundred percent default behavior. And you can see here that you get a very reasonable summary of your data. You know, not dissimilar to Skimmer, except for the fact here is that this one could go into a manuscript where Skimmer gives a fantastic summary that's really great for us as analysts and statisticians. So you can see here, actually in a TBL summary table, there are four types of data. Continuous, continuous two, categorical and dichotomous. So in this table, we only see continuous categorical and dichotomous. So age, it recognizes that age is a continuous variable because it has many, many unique levels in our data set. And it also looks at the spread here and makes its best guess of how many decimal points to round this to. So in this case, it looked at the spread of age and said, I think it's reasonable to show precision to the nearest integer here. And we're reporting the medium and the inter-cortile range here for this continuous variable. And also the line below it's showing it the number of missing values. For categorical data, such as grade, we're presenting by default and in percent. And for tumor response, you can see that that's like a little bit different. That tumor response is coded as zero one and it is therefore recognized as a dichotomous variable and it's displayed on a single row. Of course, you can change these defaults, but that's typically how you'll see these dichotomous variable to these binary variables summarized in data sets like this or in tables like this. So tumor response, we're showing that 61 patients responded just 32% of the overall cohort and there are seven missing observations. We have a footnote here saying we're looking at median IQR and in percent. And the header shows us that we have 200 observations overall. So I think this is pretty fantastic table and that was very, very simple. You can see here also, like I mentioned before, rather than showing something like, let's try to zoom in on this a little bit. The age variable or grade raw variable name or tumor response variable name, we're getting the lovely column labels that we assigned earlier. So that's why we're assigning variable labels because they're gonna be utilized all over the package and it's really fantastic. So let's try to customize this table a bit. I think one of the most common ways you're gonna see a table like this customized or modified from this absolute default is by specifying a by variable. This is going to summarize our data by a variable. And in this case, let's summarize our data by the treatment that patients received. So you can see here on the right side now that rather than one single column, we have two columns, one for drug A, one for drug B. We still have N in the header. And essentially apart from that, it looks pretty much like our default table. So easy peasy stuff so far. Our next modification is that we're going to specify the type argument. So we saw in the previous slide, there were four types of data and one of them being continuous too. So previously age was shown on a single line, but we had actually quite a few requests. Actually this happened after Carissa and I first presented GT summary at our medicine conference. So Carissa was that maybe two years ago, three years ago, something like that. Yeah, I think two years ago. Yeah, and this was like the number one request. Can I get continuous data summarized on multiple rows? So in this case, we're just showing media and IQR on one row, but you can have one row for media, one row for IQR, one row for mean, one row for syndication. All the statistics you want, that can be shown on two or more rows, which is why it's called continuous two or two or more rows. All right, so that's the modification we make here using the type argument. So just changing the summary type from default continuous to default to continuous two. The next thing we're going to do is change the statistics that are displayed. So again, here we're going to change for age rather than media and IQR. We want to show the mean right here and standard deviation. And then on the next line, we want the min and the max for response or tumor response, rather than just having the number of patients who responded and the proportion, we are going to show the number who responded and the denominator and the proportion in parenthesis. So that's what you're seeing on the right side. You have one row for mean and standard deviation for age, one row for a range and you were still maintaining that row for the unknowns. And you can see on tumor response that we are now showing 28 out of 95 patients responded in drug A group, which was from 29%. So let's just talk a little bit about this syntax we're seeing here. If you have seen or worked with the glue package or the string R package, it has a function called str underscore glue. And we are heavily using that syntax here. So what is happening in the background is that internally when TBL summary sees this mean in curly brackets, it's going to do two things. It's going to parse out what's in between those curly brackets. It's going to find that's a string called mean. It's going to go find the function called mean and then it's going to take the age vector from your dataset and it's going to calculate the mean using the mean variable. So it's doing double duty. You're placing it in a specific place within that string and you're also naming the summary function that you're going to be using. Similarly for SD, there's a function in base R called SD and it calculates standard deviations and it's doing precisely the same thing as mean. We are identifying that SD is within those curly brackets and we are replacing SD with the calculated standard deviation, right? And then another common thing you're going to want to do is although we have already changed the labels here or actually, excuse me, labeled our data in the data frame, sometimes depending on the output, you need to modify that label. So here we're using the label argument to modify the grade column or the grade variable to say pathologic tumor grade, just adding a little bit more detail on top of just tumor grade alone and passing it to the table. So you can see that it gets updated here, pathologic tumor grade, all right? So I feel like those are the most common modifications that I make to a table. Oh, I think there might be one more here. Oh yes, these are the most common modifications that I make to a TBL summary object. This last one digits, previously we noted that internally we took a look at the range of the age column and we thought, I think this should be likely summarized or to precision of nearest integer, but maybe obviously you may need more precision, less precision than TBL summary guesses. So we have a digits argument that says, why don't you please round everything for the age column to one decimal place? And so you can see how that updates here. Now we have 47.0 for the mean and 14.7 for the standard deviation for age. So those are pretty common. And I think if you become familiar and comfortable with those five arguments to TBL summary, it will take you pretty far. But before we go on, let's do a little bit on notation that we're just looking at there. So you're seeing tilde probably used in a way that you may not have seen it used previously. So what we're working with here is we're trying to support, and what we have supported is all of the tidy select helpers that you've come to love from the tidyverse, which include starts with, ends with, contains, matches, all of those selectors from tidy select and dplyr, those things can go on the left-hand side of that tilde. That's what you're gonna select all your variables using those. And on the right-hand side is kind of give the instruction for what you want those variables to do. So in this example here, I'm saying age, bare age. And I'm just gonna select that variable age and I'm saying for the label, I wanna make it patient age instead of the default. And just like in dplyr and tidy select, you can put those raw or bare, very column names in a vector C. So that's what we're seeing here. For the type, age and marker, we want both of those to be summarized continuously. And the third line here starts with age. So that starts with as one of your typical dplyr, tidy select verbs, and they're really, really useful to be able to use the entire suite of those that are already shipping with the tidy verbs. And the last one here, it's called all continuous. And that is a GT summary specific selector. What is that? What that will do, it will say things like, all continuous variables should be reported as mean standard deviation in this case. So rather than having to go into your table and say, well, age is continuous, marker is continuous, this other thing is continuous. And grabbing all of those variable names, it can look into the metadata of the TBL summary object and identify which columns are being summarized continuously, categorically, time dichotomously, and it will select all of those variables for you. And anytime you need to do two sets of instructions for label, for type, digits, statistics, anything, you can just drop everything into a list. So in this example, you can just put within the list, age tilde, patient age, and comma, marker tilde, marker level. So throughout this presentation, I'm pretty much going to stick with the tilde notation, but I also want to note that it's often much more convenient to just use a named list. So those are perfectly fine to use as well. And here's an example at the bottom where we're just giving a named list, age equals patient age. The key distinction here is that you cannot use selectors when you are using named lists. You just have to specify the name must be column name that you are giving instruction to. So what happens actually in the background is that all of these selectors starts with age, all continuous, the very first step is that I parse that selector into a named list and then just pass it through the rest of the function. So named lists are good, but for the rest of the presentation, we're sticking with tilde. So in addition to TBL summary, there are a suite of functions that are going to add additional information and add columns of statistics. What have you to your base summary table? So you can be, you can add p-values, q-values, overall statistics, treatment differences, number of observations and more. I recommend that you check out the website. There's a reference section that's just going to list every function that can be used with TBL summary and the other tables as well. There's also a family of functions that start with modify underscore. And what these are primarily going to do is they're going to change the styling or the aesthetics of your table so you can change headers, spanning headers, footnotes and more. There's a lot of ways you can do, you can modify the aesthetics of a GT summary table like with indenting, alignment, what have you. If you really need quite customized tables, definitely familiarize with it, familiarize yourself with those modify functions. These bold and italicized functions are actually just little wrappers for other modify functions, but they're going to do things like bold levels, labels in your table, they're going to italicize levels perhaps, you can bold significant p-values, what have you. All right, let's take a quick look at this example. So we're starting pretty much with that basic table that we saw previously where we passed our small trial data set to TBL summary and we split the results by the treatments that the patients received. Then we're piping that directly into ADD-P and then piping that directly into ADD-Q and we're using method FDR. So that's the false discovery rate method that, so Q-values are a way that you can control for competing, not competing risks, excuse me, for multiple comparisons. So we are sticking with the defaults here in ADD-P where we're going to do a comparison of age, grade and tumor response by the drugs that treat the patients received. So you can see here that the p-value for age is 0.7, so it's quite large. And let's take a look at the footnote to make sure that we know what's going on here. So you can see here that the p-value header has a little footnote number two and you go down there, you can see here we're looking at the Wilcoxon-Rig-Sum test and the Pearson's chi-square test here. So the Wilcoxon is for continuous and the Pearson test is for our categorical dichotomous data. And ADD-Q is going to correct those p-values for multiple comparisons, you know, it's only three p-values, there's not a lot going on here right now. And in the background, it's using p-adjust. And what you're going to find a lot within GT summary is that the documentation, of course, is going to be very clear what's happening in the background. We want no ambiguity for the users precisely what tests are being used and not just the test name, but even the R code, the R function, the R package that we're using to calculate that so that it's super clear to you that you are getting exactly what you think you're getting. It's been a bit of a nerve-wracking experience to jump into a brand new package for me in the past where they calculate some kind of like likelihood ratio statistic. I'm like, well, exactly how are you doing that? What are you doing? And in some ways it's not always clear, but when you, for example, run add queue here, it's going to print right into your console, like, hi, I'm running p-down-adjust to adjust these p-values for multiple comparisons. So I think that's one of the really wonderful benefits of this package is the clarity and the documentation and the messaging for the users. So let's look at a couple of these add functions, a couple more. Oh, one more thing I want to talk about add P. Add P has many, many methods you can use. And if you check out the add P help file, then there's a test argument that indicates which tests you're going to use out of defaults. Like you saw to Wilcoxon for continuous data and for Pearson's for categorical data. And if you had small cell counts, it would also default to Fisher's test, by the way. But these p-values are quite flexible. They can account for correlated data, repeated measures, lots and lots of scenarios. And we'll look at that a little bit later in this presentation. All right, let's look at some other common add functions to TBL summary objects. Add overall, it simply adds a overall column. So now you can see here, we have our summary statistics split by driving and drop B, but we also have an overall column. You can decide if you want that column at the front, at the back, wherever you'd like. And so that's a common thing you'll see. There's also an add end function, which instead of showing here an additional row for how many observations you have missing, what we're doing here with add end is adding a column showing how many non-missing values you have. You can definitely change that default to say, I would like to show the number of non-missing observations and the proportion of non-missing observations if you'd like, or you can update to have the number of missing observations, include the proportion of missing observations, it's quite flexible, but the default is just the number of non-missing observations. And this last one, add stat labels. So let's actually go for a look at it. Look here in the footnote, you're seeing that median IQR and end percent are being summarized here, but when you run the function add stat label, what it's gonna do is it's going to add those statistic labels directly to the variable labels. So instead of on our first line just saying age, it now says age, median IQR. So in some cases, it's going to be a lot nicer to look at your table if you don't have that in the footnote. There could be some cases as well where you have multiple continuous variables, for example, summarize with mean or medians or maybe coefficient of variation like what have you, and it may not be super clear if we just listed every statistic that's being shown into the footnote and moving it up into the variable label is quite helpful oftentimes to be quite explicit. But I think the most important reason is that some journals require it this way, such as JAMA. So let's take a look at some of these bold and italicized functions. So here, again, we're starting with our pretty basic table. It has add P already piped into it. So we have a column for the P value comparing our age, grade, and two. Yes. I'm not convinced that what you're talking about is what I'm seeing. Oh, yeah. My presenter view is not the same. It got a little confusing for a minute there. I apologize, my goodness. Okay. All right, well, what I will do is I'm gonna close with interview and just present one moment, please, thank you. Okay, so how confused are we then, people? Pretty confused. One, I'm still working on this, so what's it say? There is a request, kind of just back up a couple of slides because we lost a little bit there. Yeah, for sure. All right, is this a good place to start? Add P and add Q, or were we all good then? And also, can you see the proper screen at the moment? I am seeing an add P, add Q screen, yes. Okay, great. Thank you, again, for stopping me, so that is fun. All right, you learn something about Korto presentations every day. All right, so I'll go over this a little bit more quickly this time, but essentially we have functions add P to add a comparison between treatment groups. Now, this works for two treatment groups, three treatment groups, four treatment groups, unlimited treatment groups, as long as the test that you're specifying, you know, it works for that number of groups and works for, you know, that type of data, be that continuous or categorical, what have you. Add Q does our multiple comparisons and it's using P.adjust in the background. And here we have our add overall, which is adding this add this overall column. We are also seeing add N, which is adding this column here. And then we were seeing add stat label, which was moving our labels from the footnote, our statistic labels, excuse me, from the footnote right up here next to the variable so that you know exactly which variable is showing which statistics. All right, that brings us to exercise two. So what we're gonna do is we're going to take the data set that we made in exercise one, our labeled gorgeous data set. We're going to create a summary table that's split by whether or not the participant quit smoking. You're going to include all of the variables except for our outcome, which is death. And then I want you to build that summary table, but I also want you to consider any other functions from the add or modify or the bold italicized family. And I wanna let me know how you are modifying your table. I'll leave my screen, I'll leave my art studio off for a moment while we get started. And again, during these breaks and these exercises, please, please feel free to ask any questions. And anything else. There was a question here from two. He says with add end, can you change the label to specify that it represents non-missing values? You absolutely can. There's an argument in add end, which defaults to just end. But if you go and start changing the default statistic, you should change the default header as well to match. So there's a question in the chat. This is how to remove death variable inside TVL summary function. There are two ways to do that. So one way is to take the data set and pipe it into a select where you unselect death by saying minus sign death and then pipe that into TVL summary. And there's also an include argument in TVL summary that you can pipe the entire data set into there and you would say include and then you would, from there you would say minus death. And Mary is asking, is it possible to add custom footnotes? Say you want to explain what a particular category includes. Yes, it is. There's a function called modify footnote and you can change the footnotes to whatever you'd like. The defaults are just the defaults and you can write over them very, very... Jonathan Rubin's asking, is selecting with the minus sign the same as using not? I don't know, I actually never use not except for when I have those predicate selectors. So if it works in dplyr select, I would expect it to work here as well. Mary has a clarification. Cement a new custom footnote for something that is not already footnote. Yeah, you can place it. So modify footnote will allow you to place a footnote or additional footnotes on any of the column headers. If you want to place a footnote into the body of the table, for example, what you said, like a variable level, you want a more explanation there, you can do that but not with modified footnote. There's a higher level modified function called modify table styling. And that's kind of what's powering all of the other modified functions, by the way. And it also, in addition to having a column argument that tells you which column you'd like to place the footnote in, it has a rows argument as well so you'd have to do it there. So Caitlin gave us a nice example in the chat showing that the bang or the exclamation point works just like the minus sign. That's great. I love this one I can learn from everyone. Raymond's asking, does GT summary support role selection like recipes? So in this case, what we have is all continuous, all categorical and all dichotomous. And I think there's an all continuous two as well but all continuous type by default will select the continuous two summaries as well. So that's how we will select things based off of the role. Oh, and please feel free to raise your hand when you are done. So we keep tabs on how everything's progressing. Well, I'm going to start solving it as well here on the screen. I like to use common prefixes for every type of object that I create in R. So if I'm making a data frame, I love to have it called DF underscore and then whatever it is. And then when I revisit my old projects, I don't have to remember everything about my name convention because I can kind of have an idea like, oh, this is a data frame of results or and for GT summary objects, I like to use the prefix GTS underscore. So I can just look at something and know that it's a GT summary table and I can know that it has certain properties that I can use. So that's nice. So we're going to take our, our NH EFS data frame that we've made above and we're going to pipe that right into TBL summary. And we'll make my, and you can see here that we have a lovely table but it needs a little bit more customization. The instruction set to split these summaries by smoking says Q, SMS, SNK. So let's take a look how that's progressing and it has, it has depth in there. So I need to remove that. So I'm going to say include or minus that or I could put exclamation point, that turns out, thank you for the tips. You know, in this description of this patient population, let's, let's see if we have some basic differences by whether or not the participants quit smoking. You can see here that there are differences in age, sex, blood pressure, maybe not so much exercise level here. I am going to do a couple more modifications and I'm going to run an add stat label to remove the, the foot labels from the footnote, put them right next to the variable names, looking good. And I'm going to do one more and I am going to modify spanning header. I am going to, I'm going to use a little selector that we're going to talk about in a moment called add stat calls, all stat calls, excuse me. And so this is my final table. What other modifications did you guys make? All right, see Jeremy, you're putting in an add difference. We're going to get to an add difference example in just a moment. I love the add difference function. Oh, we have some bold P's. I like bold P. So let's add that. Depending on where you're publishing, they may not allow it. Oh, someone also said overall column. So you see here actually, because I ran modify spanning header and selecting those statistics columns before I added the overall, that spanning header only went to the columns that were split by smoking status. And when I added overall, it doesn't retain that. But if I had run add overall ahead of modify spanning header, it would have gone over all of them. Anyway, so I think we're largely done here. Is anything else anyone have any questions about this before we move on? Any comments? So I think that what we've done here is pretty remarkable at the same time, quite unremarkable. Our code is pretty simple. We've pretty much a standard table that we have here. But what we've done is we've made a really lovely table, beautiful formatting in my opinion. And it was so, so easy. What this allows us to do is it really takes away our time and our effort from doing annoying things like formatting tables and lets us focus on the true tasks at hand of doing the analysis, looking at the data investigating what is going on here. I will say in the next exercise, I'll get there, I'll talk about that. But I think we are ready to move on. Let's see, someone has a question. Can you add rankings in superscript for categorical variable? I'm not sure exactly what you mean by rankings, but you can add footnotes with superscripts to any of the levels. That is definitely something you can do. A lovely comment from Rebecca, thank you. It's just amazing. I think so too, I really love it. Okay, let's move on. All right, so this is one of my favorite functions. This is the add difference function. So let's just step through this code a little bit. So we're taking the trial data set. We are selecting treatment, marker, and response. So treatment is our binary treatment variable. Marker is a continuous marker level and response is a binary response variable. So let's just imagine that this marker level is like a marker response that we're expecting from our chemotherapy treatment, drug A, drug B, for example. And our tumor response rates are whether or not the tumor responded to those treatments. So let's think of these as outcome variables in this example. So what we're doing here is we're splitting again our results by the treatment. We are then modifying the statistics that are reported here. Instead of using the default median and IQR, I want my marker or my continuous variables to be the mean and the standard deviation. And my response variable, I just wanna report the rate of response or the proportion responded. And I don't wanna show any missing values in this table. So this is not a table one or a cohort descriptive table. This is really an analysis of your outcomes. So the default here is for continuous variables to report the mean difference. So that's why I thought it was more important to show the mean and the standard deviation rather than the median IQR because we're reporting the mean difference between the two groups, drug A, drug B. And then for the rates, we report the rate difference. So you can see here that there is a pretty small difference in marker level, 0.20 and the difference in tumor response, negative 4.2%. We get our confidence intervals and our P values. And what I really like is you can see here down in the footnote, you can see that we're using a T test to get our difference here. And we're actually using a two simple tests for equality of proportions to get our confidence intervals and P values and differences for the rate in tumor response. So again, it really makes it very, very clear what's happening to you, what's happening in the background for you. And that is a wonderful, wonderful feature that's going on here. And again, you can check out the documentation for add difference, and it's going to list all of the methods that are being used and it will even give you pseudocode for what exactly is being done, which is quite fantastic. One function that we're not going to exhibit here today, but just to know about is the generic or very general ad staff function. What this one does is it will add one column, two columns, three columns of new statistics that you define in your own way. It can add either just statistics to the label row for categorical variables or continuous variables, what have you, or you can set it up to only add statistics to the levels of like the grade, for example, grade one, two, and three that we were saying earlier. So this is a function that's super general and is actually used heavily internally to place all the new statistics that are being requested by the other ad functions. But this is really nice because sometimes you'll need something that's not shipped with the package. And this is a very simple way to be able to run your custom p-value calculations, quite seamlessly. And what's really luckily is, I'll never, we're not, we don't need to touch on that, it's complicated. Do we have any questions? Oh, no, let's move on. We have another exercise coming up. All right, so now let's update with some bolds and some italicizing just to show you what's going on here. We are now using bold labels. You can see age, grade, and tumor response are now in bold face. We've piped this into italicized levels and now you can see that unknown and grades one, two, and three are all italicized. And we have a bold p and it's highlighting our quote significant p-values here. Now, mind you, I had to increase the threshold to 0.8 for bolding so that I could get some heavy font face here. It defaults to 0.05. All right, so you saw me earlier modify the spanning header, but it wasn't entirely clear how that was being done. I was using this helper function called allStatCalls, which was grabbing all of the columns that have statistics in them, the summary statistics. There's a function exported from the package called show header names. And what that's going to do, it's going to show you every column in the GT summary table. It's going to show you what the underlying column name is and then also what the current header is so that it's very clear how you can modify those things. So let's see how you use modify header. So here we're, if you were to run this function, show header names, you would see that group A and group B are called stat one and stat two. So stat one, we're going to change that name from drug A to group A. And stat two, we're going to change that from drug B to group B. Now, mind you that we're using markdown syntax here. So you can see the double stars on each side of group A and group B. That's going to say put this in bold font, right? By default, we will recognize markdown syntax in these headers, for example. But you can change it to be HTML as well, if that's what you need to see there. And those are two GT functions running in the background called MD and HTML that will convert that string to whatever type it needs to be. So moving on to modify spanning header. So again, we're using this all stat calls to select all of those columns with summary statistics. And we're adding a spanning header that just says drug. So it's like drug, group A, group B, it's quite generic. And then here's an example of changing the footnote. Now, previously it just said median IQR and N% but we wanted it to be something a little bit more clear in this example. So we said, I want all of the stat calls to be replaced with median IQR for continuous and N% for categorical. All right, so this next slide will show the results from show header names. Oh, I thought we were two, but we did not. Okay, definitely check out the documentation. There's a slew of documentation out there on the website. There's vignettes. So TVL summary has its own vignette with tons and tons and tons of examples that are just customized directly for TVL summary. There's also a table gallery that shows all sorts of different interesting tables that you can construct using these basic tables. We're not going to go through examples with them, but just quickly I wanted to talk about some other summary types that are here. So Carissa Whiting, who's with us today, she wrote this function called TVL cross and it wraps TVL summary, but gives you a cross tabulation. So you can see here, we are passing it a data frame just like you would TVL summary. And we're saying that in the row we want to have our treatment variables and in our column we want to have grade. Our percents that we're showing, we want to show row percents. Those can be row, column, no percents, cell percents, what have you. And in this case, we're only going to be showing row margins, but you can also show column, row and column, whatever you'd like. We are calculating a p-value and we're using this argument, source note, to put it as a source note at the bottom of the table rather than as a brand new column, which is what TVL summary would do. And we're bolding our labels here. So we're getting a bolded total row, we're getting bolded chemotherapy and a bolded grade here up on top. So that's TVL cross, super helpful. We also have a TVL continuous, which will summarize a single continuous variable by two or more, or actually I think one or more, categorical variables. So you see here that on the top, we have again treatment drug A drug B and on the rows we have grades, grade one, two and three. And in the center, we have summary statistics for age. And just like TVL summary, you can modify the statistics that are being presented there from the default of media IQR. There's also something called TVL survey summary. And this is nearly identical with TVL summary, but instead of passing it a data frame, you pass it a survey object, a survey design object, and you will get a lovely summary of those of that data. And it was really nice, you can include the weighted and the unweighted statistics in the table and it makes summarizing survey data something that was, I feel, somewhat complicated before and makes it super, super simple. And you'll see that along with all of these functions that we're talking about, you know, TVL cross, for example, as well, you can run things like add P, modify spanning header. So these modify functions, they apply to every single GT summary table that we're going to be reviewing today. So that's a lot, a lot of tables. We have survival outcomes. So if you work a lot with time to event outcomes, like myself, you know, I work at a cancer center, nearly all of our outcomes are time to progression of the cancer, for example, or time to recurrence, or, you know, time to death, or time to initial treatments, you know, working with them all the time. So you can put those in a summary table here from GT summary as well. And like I said in the last slide, it supports things like add P. So you get this nice P value. It tells you that it just calculated a log rank test with of course options to calculate other tests as well. All right, that brings us to our next exercise. So we're going to be using TVL summary again, along with add difference to show differences in death rates by whether or not the participant quit smoking. So let's start with a summary table that looks at just raw, unadjusted differences in death rates by the smoking status. And then after we have that, let's build a second table that will report adjusted analysis. So what you're going to want to do for that second part is you're going, so far we've been pretty much accepting all of the default tests from all of these functions. But if you're going to be doing a test that where you want to adjust for something, you definitely cannot use the default because for example, a T test or a rate of difference in rate of proportions cannot be adjusted in those simple methods where it had to do something a little bit more complex. So you'll go out to navigate to the GT summary website in the references, there's a file called tests and it lists every test available and the pseudo code that's being run in the background so that you're again, really obvious what's happening before you get your P values out. All right, so while you're working, I'll review some of the questions from the chat. John Ryan is asking, if there are more than two groups, is there any function for post-hoc testing? So how I understand your question is like if I'm doing a table that's split by three treatment groups and I find that there is a difference between the three treatment groups can I do like a pairwise comparison of each of the groups? Is that, if I understand your question correctly? So that is something that has been asked before on Stack Overflow and I've posted a solution there but it's not shipped with by the default package. So you have to use something like add stat to customize that. It's not, depending on your situation it's not really so bad to be able to modify the results to add those pairwise comparisons and then you can do your post-hoc testing with like a two key adjustment or what have you there. Thanks for the kind words in the chat, Jordan and Jonathan and others as well, thank you. So David Perez is asking if there's a function to rather than bolding significant P values if we can add asterisk by them. So this is actually supported in our regression summaries that we're gonna get to in a moment and the function is called add significant stars and it will add stars like the significant P values. Oh, thank you Rebecca for linking to that. It's not something I commonly see as much in the medical literature but in economics and other fields it's super, super common to show the results like that. So Jeremy is asking about Bayesian t-tests. So that's not something that's shipped with the package because we don't wanna add too many dependencies to the package for example but you can certainly within add P for the TBL summary you can very simply write a custom P-value function. And what those custom functions look like is great if that function already has a broom tidy here because you can just take that object that you're calculating pipe it to broom tidy and then the results of group tidy are the precise format that you need for GT summary to recognize where the P value is, where the differences are and it can grab all of those columns seamlessly. So if your methods already have broom tidy years or they ship their own tidy years that follow the broom structure then it can often literally be one to two lines of code to write like a custom test and insert into GT summary. I just sent the link into the chat for where all the tests are listed. I'm not sure why it was getting a 404. And again, feel free to raise your hand when you've completed these two tables. And if you have questions about creating those two tables please feel free to chime in. There is one person in the chat asking for a hint for the adjusted diff. Okay, yes. Let me open up that. So we are going to go to the section that is TBL summary and add differences. So jumping to that section, let's do a quick perusal. Let me increase the font size here somewhat. And this last one, EM means, it uses the EM means functions to estimate marginal means or LS means. So if you're, you know, come from the SAS land you'll recognize LS means probably. If you're a former state of user you will maybe recognize margins, the margins command. So those are all similar here. So what this is going to do is it's going to build a logistic regression in the background and then to get an adjusted risk difference. So it's going to build a logistic regression with the variable as the outcome, the bi variable as a covariate and all your adjusting variables as covariates as well. And it's going to pass it to EM means to calculate an adjusted rate difference for the two groups that we're interested in. And it does that via the Delta method. And that's how those things are going to be calculated. So you're going to want to be using the EM means test. And just remember that you need to in at difference take advantage of that ADJ variables argument so that it knows which variables you want to include. So there's going to be a little trick to this because when you pass your data frame to TBL summary you need to figure out some way to include those adjustment variables in the dataset, but they don't, well, you don't want to include them in the table, right? So it's going to be a little bit tricky and you'll want to use that include argument to exclude those adjustment variables from your summary. I'm going to join you. So I think the first one is more straightforward than the second one. I think I do think is a little bit tricky. We're going to do our results by smoking status, right? See, does that get us pretty far to what we want? Yep, but we only want to include, that's the only one we'll just spit with this. So now we have a single table that shows our rate of death by smoking status. Here, let's just show that instead of it's looking good. Let's pipe that right into that difference. So I think this gives us the basics of the table that we're looking for, but is there anything that is kind of ugly about this table? I think there's at least one ugly thing about it. I think that the first footnote has a single percent there because number one, it's pretty obvious that it's a percent. So there are a couple of things you could do. You could use it, modify footnote to remove that footnote altogether, or you can just use that stat label to move that label of the statistic into the label row like I've chosen to do here. So there's that first table. So let's call this DF unadjusted. It's not a DF, excuse me, it's a GT summary table. So let's make a GTS, make an adjusted one as well. So it's gonna start out very similarly. So it's starting out just like we did before. And let's just go right into add difference. And we can see here that we actually have by default a two simple tests of proportion, which is not what we want. So what we do want is to have that difference by EM means. And then we also need to do what? Specify the adjustment variables, sex, blood pressure. And let's also do the stat label before we take a look at the results together. All right, so the table looks very similar to the first table we created. The raw rates in columns two and three are exactly the same. You'll note that it's reporting an adjusted difference now rather than an unadjusted. Now, this is where I was gonna say, let's not get too caught up on the proper analysis and interpretation of what we're doing because there are lots of things on the causal pathway of smoking and death. Maybe those things are also being adjusted for as covariates, but this isn't an epi lesson. This is a programming lesson. So we are essentially just going to interpret this as adjusting for age, sex, blood pressure and exercise. Is there a difference in death rates by smoking status? And what does this result tell us? Nope, not really. We have a p value of 0.5, no evidence for that. Not enough evidence to conclude that I should say. And you can see here, footnote one, regression least squares adjusted mean difference. So like I said, it's very clear what's happening here in the background. When you go look at the EM means, it shows, tells you you're doing estimated marginal means, adjusted mean difference is what you're gonna get out as a result. In this case, it's saying I'm gonna do a linear regression model then I'm gonna pipe that into EM means and do the specs by the by variable. Then I'm gonna do some contrasts, pair-wise contrast and then I do a summary and I'm gonna get a confidence interval out of that. So that's the code that's being used in the background and it says when your variable is binary, that's us in our case, it's actually gonna use GLM with finally binomial. So it's gonna be the logistic regression model. And now I said that you could also use correlated data here and that continues to be true with EM means. If you had specified a grouping variable in ADP, that would be the way to specify that you have correlated data within patient, if you have multiple rows per patient, if you have correlated data within hospital, with whatever you've got. And it can also do the same for you building a random intercept model or a random intercept logistic regression model. So, any questions about that one? Because that is a lot. It's also using kind of a fancy method. We're doing some Delta method in the background. EM means like, I know for me, when I was first started using EM means, I had to look up the code every single time. But now that it's wrapped up in GD summary and I have my pseudo code here, I just, it's so seamless to produce these figures and adjusted rate differences or adjusted differences for continuous variables is such a common thing to report in my research. So it's been super fantastic to have around. How do we feel? Are we ready to move on? I was gonna ask if this would be a good time to take a break now or if you wanna move on? We have a break coming up in just a few more slides, I believe. Okay. At the end of this section actually. All right, so we've talked about TBL regression and its cousins. We're gonna talk about something, kind of in a different field now, a TBL regression object. So, of course, summarizing data frames and survey objects and survival data, like that's all super, super useful. Another common, common thing we do in my fields of research is to use regression to make inference and predictions about regression models. And one way to get the results of those regression models in a beautiful, publishable paper is by using this function. So you may all recognize the default output. This is a logistic regression model. So in this case, we are predicting a tumor response. That's our dependent variable on the left-hand side there. And we have two variables, covariates, age and stage that we're using to predict response to treatment. We're using our trial dataset and we're saying family equals by no meal. That's the thing that specifies a logistic regression. And on the right, this is what you get. It's not that easy to read, definitely not shareable. You can't just share that with a stakeholder or send that off for publication in your manuscript. And shockingly, there are very few ways, I would say, to get these messy results into a beautiful table. And I think GT summary does it the best. So let's just show you this simplest, simplest table here. So I saved this object as M1 for model one, by the way. If you just pass TBO regression M1, you get the table on the right. And that table is, it has a lot of fantastic things in it. Number one, it's showing you all your p-values for your covariates. I think that's great. But what I think is so, so nice is that it's able to identify the reference level for your categorical variable. So you can see here that in the messy summary, we just have stage T2, stage T3, stage T4, but there's an implicit stage T1, which is our reference group. And it's able to identify that and put a row for it and put an M dash to say like, hey, this is our reference group. And that is super, super useful for getting you up and out the door with a publishable table. There's also something going on here about model types. You can see here that the estimate or the beta coefficient, rather than just being shown as beta, it's saying it's the log OR, and the footnote telling us that OR stands for odds ratio. So TBO regression can identify these common regression model types and it can replace the default headers with model appropriate headers. So I love this, this says log OR. I personally wouldn't present the log odds ratio, but so let's show how to modify that a bit. So just like TBO regression, excuse me, just like TBO summary, TBO regression also has a suite of add functions. And I said this previously, but all of those modified functions that we previously reviewed, they all work with TBO regression and any other function that we're going to look at today. So here we are again, using TBO regression on the M1, which is a regression object, and now we're specifying the argument exponentiate equals true. So you can take a look on the right, rather than having log OR, since we're exponentiating now, we have OR, which is how we all present these models regardless. Anyway, so that's a good thing. So we can just exponentiate equals true. You know, these argument names, if you are a tidy models or a barone user, you know, these argument names, they match up, so that makes it really fantastic. So let's move on to the next slide, add global P. So in the last version of our table, take a look, we had P values for stage T2, T3, T4, and they're all referencing, are they, this is a P value testing for a difference in log odds for T1 versus T2, T1 versus T3 and T1 versus T4. It's often the case that you really just wanna know is stage related to my outcome, rather than is there a difference between T1 and T2? So you wanna add these global P values. Now these are largely calculated with the ANOVA function in base R. If you have a more complex model, it will be calculated in a more complex way, but it's still legit and all works, and also very well messaged to you what's happening in the background. So add global P is going to move those individual P values at T2, T3, T4 to here, right up here, where it's just going to show you that the P value for stage altogether is 0.6. And then lastly, we're going to add a glance table. If you are again a broom user, you'll be familiar with a function called glance that gives you, you give it a model type or model object and it returns you a one row tipple with a bunch of statistics that apply to the model as a whole. So this will include the number of observations, the log likelihood, sometimes the deviance, the acaiki information criteria, the Beijing version, a whole slew of statistics depending on the model type that you pass to it. And so you can use add glance table to include, by default to include every single statistic from that glance function from broom, but here we're just going to show a couple of them. It's really a fantastic way to add some additional information about your model to the table. So a huge number of models are supported. So essentially anything that has a broom type here will likely just work right out of the box, which is pretty fantastic. It's going to work with survey regression models, time to event regression models, GAMs, random effects, mixed effects models, all sorts of things. What else? There is some pretty interesting stuff here. If you're using random effects, I just want to make a quick note. I use random effects pretty frequently, but I don't report my random effects in the modeling and in the reporting that I need to do. So by default, it has a different tidier that essentially does the same tidying that you would see from broom tidy, except for that it removes the random effects from the resulting table. So if you need to report those, you just specify in the tidy fun argument, rather than letting GT summary choose which tidier is going to use, which is the one that I wrote because it's the way I like it. You just go in there and you say tidy fun equals broom tidy, and it will go back to the default and it will report the fixed effects and the random effects in your summary table. All right, any... Oh, we have one exercise and then the break after, I believe. Okay. So before we start the exercise, are there any questions that we want to chat about? Right, let's see here. So yeah, someone's asking about exponentiate. Yes, it exponentiates the beta coefficient. So Madeline is asking about, let's say she sees BRMS, but not RMS. I'd have to double check. I'm not, I don't recall if RMS models are supported. It's not on the list then, and it doesn't have a tidier, then you might have to do a little bit of work to make it supported. If you use a lot of RMS, then just file an issue and we can add support. Okay, Hannah has a question. If I use add significant stars function on a TBL regression object, the P values aren't displayed on the side and add P gives me an error. Do I have to use add stat and pull them into the table myself? So Hannah, add P is something that you would typically use with like TBL summary or TBL cross and not with TBL regression. So if you prepare a quick little example, we can look at it during the break, for example, perhaps. So Jordan is asking about interaction terms. Interaction terms are actually handled, I would say, gorgeously. It was actually kind of tricky to make them look so nice. And this is a huge amount of this regression work was done by Joseph Lamar Raj. He is fantastically talented programmer. And the interaction terms, I would say no issues and they look gorgeous. And if you want, we can do one during the exercise. I can show you what the results look like. All right, well, let's get started on exercise four. So we're gonna build a logistic regression model with depth as the outcome. And we're going to include smoking status and all the other variables covariates in that model. And then we're going to summarize that model with logistic regression with TBL regression. And then I want you to show me what modifications you've made to the table to make it cute for your field. Let's get started. So Jeremy is asking a question about if there's a way to add a footnote saying that T1 is the control because some people may interpret that as like missing data. So yeah, you can definitely add a footnote, any GT summary table, you can add a footnote to any cell, any header, that's no problem. You can also change those M dashes to say like reference or REF is something I've done in the past for journals as well. Like I think, for example, JAMA wants to see REF period for the reference group rather than an M dash. There's another option else as well. There's an argument in TBL regression called add estimate to reference rows. So if your reference level is like a zero, for example, if you have an exponential, it will add a zero instead of an M dash. Or if you have exponential, it will show an auto ratio of one, for example, to indicate that that is the reference group. So I see a lot of questions popping up in the chat. So before I do my walkthrough, I'll try to address those. All right, that's our time. Does anyone need more time or are we feeling good or should we do a walkthrough? I'm hearing a resounding walkthrough. Okay, let's continue. All right, so we're gonna build a logistic regression model. So let's just start with that bit. Death is our outcome. And I saw someone in the chat note that using death till the dot is a wonderful way to include all remaining variables in the data set as covariates. So because we are working with a very small data set here, we can do that. We need to add our family equals binomial. Oh, I forgot to pass a data set. That's also needed. Looks like we are getting that summary that we're all used to. So let's put this into TPL regression. See what we're working with. All right, so we have a big long table. Here, we definitely need to do an exponentiate equals true, right? Expone. All right, what other modifications would you like to see on this table? So someone's asking about add global P. So let's do an add global P, which we've seen before, but they were asking, can I have a global P value and keep the individual P values as well? There is an article key. We say keep true. It's gonna keep all of the original P values as well as add the new global P value. So there we are. And then replacing dashes and reference group with rep. All right, you are actually testing my memory here. It's been a long time since I've done this. So let me show you one easy thing you can do first before making it wrap. And let's open up the help file for TPL regression so I can remember what the argument need is. Add estimate to reference rows equals true. So in this case, oops, I've done something wrong. I don't want to debug with you all watching, so I'm gonna skip it. Oh, I've done something else wrong. Oh, I haven't. Okay, debug complete. So let's take a look at this. I know I have the compact theme on, so everything's pretty small. Let me see if I can reset that. It's reset theme. So let me, once again, it'll be a little bit larger for you to see. All right, so I added this add estimate to reference rows. So previously here where you had an OR with a dash, it's now showing a one to indicate that it's the reference row. So that's one way you can do it. So I'm going to set that back to false, which was the default. And I am going to try to remember some complicated code in front of all of you. So let's see how it goes. So when you want to, what's happening in the background is that we are essentially doing a beautiful print of the data frame. And that's exactly what you see here is a nice print of a data frame using GT package. So those reference rows are actually just missing values that I have told GT. When these cells are missing, I want you to put an M dash. So what you can do is change it up a little bit and say, instead of putting an M dash, I want you to put something else. So I'm going to do columns equals estimate. So I know that that's called estimate column because I previously did show header names and it showed me all of the column names. Maybe I can show you that as well. Sample. So here's our table before we started adding a couple of things. But if you pipe that into show header names, this is what you're going to get into the console. So at the bottom here, you're looking at on the left, the column names that are underlying what you're seeing. And then you're looking at the headers, right? So those are the current headers. So you can quickly identify which column name is associated with which column in the table. And then it has a little usage guide for modify header, because that's typically what people are doing. It's not exactly what we're doing here. So I know that the OR column is called estimate. And so that's the column I want to do. And then the rows, what you can do here, this is not easy, but there is an example in the table gallery, I'm pretty sure, is that data frame that we're going to be printing is called table body. And you can take a look at it in here. And that's all this is. It's just a table full of summaries for this regression model. And you can see here that there's a column called reference something. You can't really see what it is, but I think it says reference group, something like that. So let me see, that's actually what's called reference row. So when this reference row is true, this is the change I want to make. So let me go into the help file for modify table styling. And let's see here, missing symbol. I'm going to do missing symbol equals ref. So instead of putting a m dash for this missingness, I am going to show the extra ref. So let's see what that looks like. And now it says ref instead of dash or one as we saw before, you can include the confidence interval here by selecting more than one column at a time. Now it says ref. Dan, you could also do this using themes, right? Is that correct? Oh, yes, I believe. So we'll get to things in a moment and I believe that one of the JAMA themes, I believe it's the JAMA will do this for you automatically. So you don't even have to ever remember how to do this. Can you please drop that chunk into the chat? Yeah. Oh, right. So there we go. We have this gorgeous table. Did I say anything else to do for this? Somebody's Audrey has come. Anyone else have any ideas on what to do to make this table cute? Bold some new values if they're small. I like to bold the labels off the text too. I think it makes it a little bit easier to read. We have a request for the likelihood chi-square test. Okay, let's do it. So I usually like to add all my statistics before I start modifying the aesthetics. So let's go up to here, add glance. So we already saw an example with glance table in the slides. So I'm gonna do add glance source note this time. So rather than adding a new row for each one of these statistics, I'm just going to do a source note that's going to list them all in one single source note. So let's take a look at what that looks like. It's going to be a lot of statistics, I believe. So you can see here at the bottom that the null deviance is 1476. The null degrees of freedom is 1547. The log likelihood is negative 553. The acaiki information criteria is 1121. Evasion version is 1164. Deviance residual degrees of freedom, number of observations, it's all there. So I often prefer to see them as a source note like this than like six more rows onto my table. But depends on what you need. If you just need a single statistic, then maybe one additional rows not a big of a deal. If you, but there are lots of fields that expect like one row for AIC, one row for BIC, one row for log likelihood, that kind of thing. And so if that's the kind of place you publish, then you have add glance table as well. So but add glance table, add glance source note, they're like sisters, they're documented together because they're pretty much identical. All right, did we cover all of the questions about this regression table here? Oh, so Jordan is asking questions, are the P values per walled by default? That is a fantastic question. And so it's essentially what's going, the default is going to be what you get from the summary function from your type of regression model. And for a GLM, I am, I would say 97% sure it is a walled P value. And I, so I think there's like, I think it's funny with the logistic regression model that I think that they report P values from the likelihood ratio statistic and the confidence intervals are walled or it's vice versa, something like that. But you'd have to look up the details for your specific model and what the summary function returns because that is going to determine what gets printed here. Now, if you need to change that default, there are a couple of options. You can look into the tidier that you're using and if that tidier has an option to change that statistic from LRT to walled or score what have you, then you just pass that argument in the tidy fun argument for TBL regression, which is going to specify the tidy function that's going to be used to create the table. And if you don't have that available to you, what I've had to do in the past is do a bit of a construction of it myself. What I've done in the past is, you build these a little tidier that changes the default confidence interval type from whatever the type was to walled or for example, something like that. And there is an example of that in the table gallery. Let me show you the table gallery real quick. So this is the GT summary website. You go to the articles and then the gallery. Pretty sure we have a walled test statistic. How to format a walled confidence interval. And so here I had to write my own little my tidier function that looks exactly like a typical tidier. And it's primarily using the default tidier for a logistic regression model or linear regression model here. But then what I'm doing is I'm replacing the confidence intervals that are there by default with the output from comfint.default to get my walled confidence intervals. And then what I do in TBL regression is I specify this argument tidyfun equals myTidy and myTidy is just my version of the tidier that uses walled confidence intervals instead of the default. This is for a linear regression model here. So you can still utilize the underlying proof tidiers oftentimes and just change the bit that calculates the confidence interval. Okay, so I think we have a break coming up. Feels like a lot of talking on my side. Oh, almost there, almost there. Okay, so in addition to summarizing a single regression model, we have a function called TBL UV regression. So it's not entirely uncommon that you need a table of a sequence of univariable regression models. So rather than passing a model object, you've already created like TBL regression. TBL UV regression passes, except in data frame as its primary argument, then it has an argument method saying what regression method am I gonna be using here? It's GLM. It has a Y argument saying what is the outcome of this model? And then method.args is a named list of arguments that are going to be passed to the method. So for here, we're gonna stick with logistic regression example. And you can see here that this named list, family equals binomial, that is an argument that's gonna get passed to GLM. So what's going to happen here is internally, it's going to find the response variable and then it's going to sequentially go through every other column in the data frame and it's going to build a regression model for each one of them. And in this case, a logistic regression model with Y or responses, the outcome and then every other variable sequentially as the single pro-variate in a series of univariable regression models. So you can see here that the default looks a little bit different than TBL regression because there's an additional end column because of different patterns of missing data, you can have varying ends, which can be fine in a lot of cases, but it can be alarming in some cases if you're trying to compare results across various models that include different sets of patients or observations in it. So it's a good indicator that you're working in a univariable setting and that you are communicating to the users or the readers whether or not these models were all built on the same patients or observations. And just like TBL regression, you can use at global P. It has essentially the same arguments as TBL regression for modifying the output. You can change the M dashes to ref, you can change them to be a one for the odds ratio for the reference group, for example. Same, same, same question, arguments as before. All right, now we're to the break. So get up, stretch your legs, and we'll be back in 10 minutes. Or I guess is there anything we wanna talk about before we go out? Let's do questions when we return. 10 minutes, see you there. All right, well, that is our 10 minutes. That was a very fast 10 minutes for me. I don't know about you. But let's, before we move on, let's just chat a little bit. How are things going? What questions do you have? Feel free to drop them in the chat or unmute yourself and ask the questions. There's a question here in the chat. Does this work for Mark, for Corto? Yes, it works for Corto. By the way, all of these slides are built with Corto and the course website as well. So I was asking about these timers. Garrick Adambuy wrote a package called Countdown and it is so simple to add these queue timers. So you can be working sharing again in Corto presentations. I think any HTML slide presentation would work. So Jordan's asking a question. Is there an option to compare multiple regression models within a single table? For example, when using sensitivity analysis. Yeah, we're actually going to show some examples of how to combine results from multiple or separate regression models into a single table so you can easily compare them. So Jordan says, I'm thinking of a situation where they're using different adjustment sets. So just like in TVL, what's it called? TVL summary and in add source note or add glance source note, there was an include argument to say which statistics or variables you wanted to include. The regression functions, all the TVL regression also has an include argument. So let's just say I built a regression model that has like my primary covariate that I'm interested in, for example, smoking. And I wanted to test three different models, one unadjusted, one where I just adjust for age and blood pressure and one where I adjust for age, blood pressure and exercise and sex, that would be the full model. If you say include equals the smoking variable only, the models in the background does have a full adjusted analysis back there but will only print the one odds ratio for smoking status, for example. So then that makes it very easy to see how the odds ratio changes from unadjusted to adjusted for a smaller set of covariates to a full set of covariates. Does GT summary play nicely with tidy models workflows from RAVE? And so you can make it work nicely with tidy models. Actually, for the next release, we're working on making it a better experience. One issue you might encounter while working with tidy models to prepare your model is that during the pre-processing, it might take a categorical variable and create a three, four, five, six, seven, eight, nine, doesn't matter how many levels you have, dummy variables to represent that single categorical variable. So if you are using a method that does that and then you are then passing that those dummy variables to the modeling function instead of passing like a factor or a character that's going to implicitly make those as dummy variables, there's not a good way for us to reconstruct those dummy variables back into a single variable. So you might not see something like stage or grade and then indented levels one, two, and three with the reference group because at the moment we can't recognize that those dummy variables that have been passed to the modeling function all belong to one another. But I think it's something we can overcome and hopefully in the next release it will be better. There is an option within tidy models to not create those dummy variables by default. And I do not recall it offhand. Sam, there's a question earlier about including subscripts or superscripts and your variable labels. Do you have any experience with that and how that works? Yeah, so just are we talking like there's two things we can consider here. One is just like regular footnotes and that would be entirely possible with modified footnote or modified table styling if you need to place the footnote in the body of the table. If we're talking just having a variable label that has a superscript or a subscript, for example, that will also work. So you can use UTF-8 characters, no problem. You can use markdown language as well. I will say I believe that GT at the moment didn't quite recognize the, you can put like two Tildas for superscript or subscript I should say. And I don't think it recognized it because of some difference in something called common markdown language versus another type of markdown language. There was some kind of inconsistency there. So not everything, but it's largely gonna depend on the output type that you choose. Now we've been showing all examples using GT so far, but we will show examples with other output types. So when you're doing those kind of specific things, you do need to be cognizant of the output type and what that output type allows you to do. Okay, so generally supported, but we'll talk about print engines in a little bit and just to kind of keep an eye on what format you're trying to print to and what package you're using to print, is that correct? Exactly. All right, are we ready to get started? Any more questions? So, so far we have made a bunch of tables and I think that's really fantastic. I love tables, but there's, I feel like there's always this separation between creating my gorgeous data summaries, model summaries, what have you. And then when I'm writing my reports, I need to reference those numbers. I need to pull numbers from tables and put them in the text of my reports. Now, of course, one easy way to do that is to just take a look at your table, copy that odds ratio, put it in the text of your report, whatever. So that is one way, but what happens often to all of us, I'm sure, is that your data updates or something happens, whereas anything happens, you have to rerun your report and you need to really be cognizant and remember to change all of those hard-coded odds ratios, for example, in your report. But I don't like to do that. So we have this inline text function and what inline text allows us to do is report any number from any GT summary table in the body of a markdown document. So let's go through some examples. So, I guess this essentially says what I just said, but let's talk about regression models first. So the default pattern for reporting of regression models using inline text is gonna show us that estimate, the confidence interval, and then the p-value. And again, you can use that show header column names function to see that the column names are actually going to be estimate and comp.low and comp.high p.value in the background, and that's how you can recognize what you can change the pattern to. So let's just see an example to make it more clear. So this is the Univariate Regression Table that we saw in the last section, and it was saved as an object called TBL UV-REG, right? So I am in my markdown document, the code in like not in the code chunks, for example, but in the body of the markdown language where I'm writing my report, I'm going to type the odds ratio for age is, and then you use these special backticks. And if you haven't seen these before, this special backtick followed by an R tells this our markdown document or the quarto document to say, please evaluate this, the following command or function in R and then whatever the result is, put it here in line right here in the markdown document. So what this is doing is it's saying, using inline text from the TBL UV-REG object we created previously, I want to report on the results of age. And then when you render your report, you're going to see something that says the odds ratio for age is 1.02 and then in parentheses, 95% confidence overall, 1.0 to 1.04 and the PAU is 0.10. So easy, so nice. I can't tell you how useful these things are. This inline reporting is just incredibly useful. It's going to make your reports super, super, super reproducible and I love reproducibility and we require it on my team actually. So that's how you would use it with a regression model. And I unfortunately tried to pack a lot in here so let's just take this one bit at a time. So here we are building a TBL summary object and we're using add difference, right? So we've seen all this code before, we are including marker and we're summarizing marker levels by the treatment and we're looking at a difference as well. So what that table looks like in the end is the marker level with a difference of 0.20 between the two groups and the confidence on the P value. So how are we going to report on this? So in the code, you might see something like this and I apologize for that last line. You might see something that's like the median interportal range marker level among participants randomized to drug A was and then you're going to do the special back tick R, inline text, GTS small summary, that was the name of the object we just created above. You want to grab estimates from the variable marker and from the column drug A and what it's going to do, it's going to go up to that table, it's going to say, okay, well, I want to grab things from this marker variable here and I want column A. And so what you're going to see over here on the report column is that the median was 0.84 and the IQR was 0.24 to 1.57. So that's how you, a very generic way of presenting the entire cell from drug A in the report. Another thing you can do is use the pattern argument. So if you an additionally specify the pattern argument, you can extract any statistic from that cell that you previously reported. So we reported the median and the IQR, but what if I just want to report the median? So it looks like I forgot to update my text here because it says median IQR, but let's pretend that doesn't say that. See the median age among participants randomized to drug A was, looks like I forgot to update a couple of things because not having to be age is meant to be marker. So the median marker level in drug A was lab and you can see here on the right side, it default, it resolves to 0.84. So you can report any single statistic from these cells from TBL summary. But how would one go about reporting perhaps that difference or the P value? And that's the one that's a little bit more typical here to see unfortunately. So the difference is a column called estimate and then the confidence interval is in a column called CI. So that's what you can't see there intermixed with the other text down the bottom. And so you can see here the difference in marker level was 0.20, 95% confidence level, negative 0.05 to 0.44. So that is a very, very, very brief review of inline text recording. Like I said, this is super, super useful making repressible reports and I recommend you use it. And that brings us to exercise five. Before we get started, any questions? All right, so as a part of our exercises here, we have created a label data set. We have created a summary of the baseline characteristics. We have produced adjusted rate differences of depth between smoking and unspoking, unadjusted. And also we've built a logistic regression model to kind of report those adjusted rate differences as those adjusted differences as well, but we were reporting odds ratios there and not rate differences from EM means that have been transformed. So what we're gonna do here is lastly, we're going to write two to six sentences whatever you'd like summarizing your results from our previous exercises, and you're gonna use inline text to help you report those results. So again, you're definitely gonna need to use the GT summary show header names function so you know what to report and let's get started. Noting very quickly that if you have visited the solutions online to get a peek what the results should look like. Typically the render document will show the code and the result, but for these inline text, it does not because it's rendering it for the report. So I dropped a link to the raw file that will show you the code if you'd like to take a look. So there was a question for sex, how do I show the male level instead of the female reference group? There's an argument in inline text for TBL regression objects and that's called level. So you wanna put level equals male and quotes and it will report those results for you. And Raymond is requesting that the final code be pushed to the GitHub repo. It is already there in that link that I just dropped. It's there's a document called solutions.qmd. That's just like an RMD document but it's a Corso document but it pretty much looks exactly the same. So all the solutions are there. Oh, Raymond's clarifying what I live coded confession that I pretty much live coded exactly the solutions that I wrote previously and in preparation for this last bit of live code for the inline text, I deleted what I had live coded with the solutions so that everything would be named to the same because this inline text code can be a little bit longer and more cumbersome to write. All right, so what I decided to report in my many summary of these results was the median age for participants who quit smoking and for those who did not quit smoking, the P value for the differences in age and also I wanted to report the unadjusted difference from the TBL summary object in death rates. And then I also wanted to report one adjusted rate but I chose to report from the logistic regression to show from various objects that you can report from. So like I said, these codes can be a little bit long. So what I prefer to do is I like to have in my reports a chunk where I set echo equals fault. So it's not going to echo these results into the rendered document, these codes I should say but the results will be there. So I like to program these all up and I like to give them like long descriptive names. So for here, when I wanted to report the median age for quitters, it's called median age quit. So it's pretty descriptive when I'm reading my text below after I inserted, it's pretty clear what it actually is. So I have a median age quit here and I use inline text. I pass my GT summary patient characteristics object that we created in one of the first exercises. I want to report for variable age and I want to report for column quit and I want to report for the median only. And doing it similarly for the patients who did not quit specifying column did not quit. I'm grabbing the key value for that difference in age. And here, this unadjusted difference, I'm grabbing for from GTS depth, unadjusted. That was what I believe we created that in exercise three and I want to report on the death variable and I wanted the pattern to be difference, insert estimate here using the curly brackets. Oh, I'm so sorry. I didn't actually describe the syntax here but the syntax is just like it is all throughout GT summary where anything in these curly brackets is what's going to get inserted in. So apologies for not describing that in a little bit more detail before we got started but I hope it was clear based off of our prior context. So here when I have estimate and curly brackets I'm going to be inserting the difference estimate and I'm going to put the confidence interval here and the P value here. And similarly for the inline text reporting of the regression model, the logistic regression model I've been reporting the odds ratio and the pattern that I want to report is odds ratio estimate semi colon 95% confidence interval. So when I write up my report, I said the analysis assessing the relationship including smoking and subsequent death within the next 20 years was included N row, DF, and a, this is our data set. So what is this going to return? This is going to return the number of observations. So 1548. So I'm saying here that included 1548 participants. The median age among those who quit was higher compared to those who didn't, did not. And then our report here in this chunk here median age quit, median age did not quit and the P value comparing them on univariate analysis participants who did not quit smoking had higher rates of death. And I'm going to report that unadjusted difference that we calculated above. And then however multivariable analysis the relationship was no longer significant. I know this is a very not great summary if I were doing real science here but we're not we are reporting and that's all. So let me show you what the results look like. This is from the website. So here at the end we say the median age among those who quit was higher compared to those who did not quit. 46 versus 42 P value less than .001 univariate analysis participants who did not quit smoking had higher rates of death. Difference negative 4.9%, 95% CI negative 9.7 to negative 0.07, P value 0.037. However, a multivariable analysis relationship was no longer significant. Odds ratios 0.90, 95% CI, 6.64 to 1.26. So there you have it. What questions do we have? So if I change the CI to 90% I need to update the pattern to 90% correct. So that's the questions from Jeremy. So if you're using the default pattern you actually don't need to change anything because it will find the proper confidence level that you've already specified and will report that in the string that is returned. However, I think in my example I just I wrote 95% here you would need to hard code that to match what you have written. I could have changed this 95% to in curly brackets to say conf.level but I think it kind of just makes it a little bit messier for this instructional purposes and also I exclusively report 95% confidence intervals and not 90% or anything else. So for me, it's just easy to hard code. Do we have any other questions before we move on? All right, there were some questions a little bit ago about ridge regression and lasso and elastic nets. So they're not supported directly out of the box because we're using tidy years to prepare the results before we take over and just them up even further. If you look at the results from a tidy summary to a broom tidy summary of like an elastic net model you'll see that it will give you a lot, a lot, a lot, a lot, a lot, a lot of versions of the covariates. And that is across a range of cumulative factors as well. So yes, we can handle them but you have to do a little bit of work before you pass it to us. You need to essentially look at that tidy data frame and you need to tell me which of the penalization factors is the one you wanna record on because that default does include all of them. Another question, can I add latex symbols in the inline text like chi-square results? I think so, I avoid latex when possible. All this does is return a string. And so if markdown allows you to put, recognizes those latex symbols, if you were to just write them then it should just work out of the box here as well. Okay, so we've talked about the majority of the tables that you can construct with TBL summary, excuse me, with GT summary and there are others as well but one of the most amazing things about this package is your ability to cobble together fairly complex tables from these pretty simple tables that we've been constructing. I mean, they're simple in a way but they're quite thorough in another way but we're gonna show you how you can even further customize them using TBL Merge and TBL Stack. So let's just start with a univariable table of regression results and a multivariable table of regression results. So both of these are Cox regressions, proportional hazards and what you can do here is on the left we are making a series of univariable regressions using with the outcome here being time to death and we're adding a global beam and we're doing the same thing on the right side except for rather than UV reg regression we are building a single Cox model that contains the same covariates as the univariable regressions on the left side. So we have two tables looking very, very similar. Again, the univariable table will have that column of N to indicate that it's a slightly different set of patients that the models were built on but there we go. So it's not uncommon that you would wanna take something like this and put them side by side and to speak to Jordan's point earlier is that you can also use this functionality for looking at how sensitivity analysis may modify your results. So if you adjusted your analysis for 10 covariates instead of these two covariates that are easily recorded for every single patient versus these 10 covariates that are maybe only recorded for 50% of the patients or only recorded at my institution but any other institution would have them, for example, that's another way where you would use TBL merge but let's just take a look at the results real quick. The first argument is a list of all of the GT summary tables that you want to merge and then here we're using the tab spanner argument to say that the first table that we're passing is a univariable results and then the second table is multivariable. And in the end, you get this nice table side by side results all in one table. It's pretty easy peasy if you ask me like constructing these types of tables outside of the GT summary is a headache, huge headache, I think. Any question, like that's all we're gonna do for TBL merge essentially. So if you have any merging questions now is the time. All right, so in addition to merging we stack things as well. So a merge was a horizontal combination and the stack is a vertical combination. So let's just do a similar thing where we have a univariable table on the right and a multivariable table on the left but you can see here we're using on the right the include argument to only show the results for treatment, right? Even though you can see above in our Cox regression definition that we have in that multivariable model treatment grade, stage and marker are table because we're using the include argument in TBL regression only is showing the treatment variables at the end. So on the left again, we have a univariable hazard ratio of 1.25 for treatment predicting time to death and on the right side an adjusted hazard ratio of 1.3. And you can also see that it recognized TBL regression recognized that this was a Cox model. And so it's showing us the hazard ratios in the head or just a generic beta. So similarly to TBL merge you pass TBL stack a list of GT summary tables and you can optionally pass a group header argument which will put a bit of a header between all the tables. So here on the right, you're seeing the top table is unadjusted, it's a little small and the bottom table is adjusted, right? So there we go. So these are kind of giving you tools to be able to kind of construct much more complicated tables. There's one more that I find fantastic with TBL strata and essentially what this will do is you tell TBL strata, hey, I want you to do some COG from GT summary calculations for me stratified by grade in this case, for example. And you get to use that cute tidy verse notation with the tilde indicating an anonymous function and now dot X is representing that subset or that stratified data frame. So what is happening here is that I'm telling TBL strata, I want all of these things stratified by grade. And what I want you to do is for every grade, I want you to build a TBL summary table that is split by treatment, doesn't contain missing values and I update my headers to just show the level instead of the level and the end because in this case it was allowed to report. And what it's going to do at the end, it's going to build each one of these tables separately for every grade and then it's going to merge them. Optionally, you can also stack them, but the default is to merge and then the column headers default to the stratified variable levels. So this is a very easy way to get very complex stratified cohort descriptors, for example. And this will work with any GT summary table. So this is an example of how to use these tools to construct some pretty complex tables. So quite some time ago at this point, some epidemiologists contact me on GitHub, I believe. And they said, hey, we really like this package. Can you include a table that would do Cochrane-Mental-Hansel odds ratios? It's something that we report all the time. I took my first semester of Epi in grad school and I think I made this exact table, I'm pretty sure. And I told them like, okay, I can see why you need this table, but it's a little bit specific for this package, so I don't wanna put it in the package. But what I can help you do is I can help you construct this table, make a new function called TBLCMH for Cochrane-Mental-Hansel. And we can, using the tools of GT summary, pretty simply put this table together for you. So what did we do? We started with TBL cross to get a cross of the exposed and not exposed by stage. So that's easy enough. Then we could use, for example, TBL Merge to merge together this cross tabulation for the controls and the cases. You could have also done that with like a TBL Stratocall and a single call. Then what you could do is you can then stack them so you can have one set of tables for grade, one set of rows for stage, and that gets you almost all the way to the end. And at the very, very last step, there's that add end function. Now, that add end function, we mentioned it briefly two hours ago, probably. And we can add custom statistics to any GT summary table. So here, what we're doing is we're showing odds ratios overall in that crude group, and then stratified by stage, and then stratified by grade below that. And then we're combining it for an adjusted odds ratio using the Cochrane-Mental-Hansel methods. And so that's what we're doing. So we wrote this function called TBL-CMH, and all it does is it takes your data, does a simple TBL cross, murders it, stacks it, adds a custom odds ratio statistic, and bam, you're done. And you don't have to worry about all of that, the special cases of unobserved data and this and that because all of those things are handled within GT summary already. So it was a pretty easy way to get a pretty complex table in the end. Shannon and I have recently written a package that called GT-REG that does a bunch of tables that are commonly found in regulatory submissions. Now, Shannon is gonna be talking about this in two days at the conference. So this is just like the briefest introduction to it, but it uses GT summary in the background to construct, for example, the one I'm gonna show here is an adverse event reporting table. Now, if you've ever had to report an adverse event for complications after treatment, it can be pretty annoying to get it all in the correct format. So this is a wonderful package that gets, it makes it very easy to report these things, but also you can see here that we're using the functions we already know, like modify header and bold labels within this, because in the end, you are creating a GT summary table and a GT summary table, if it meets like a few criteria, then you can use all those modified functions. You can use all those bold and italicized. You can use all of the functions that already work with GT summary tables, they'll work here as well. So in this case, this is the resulting table and it's gorgeous and you can see here that this only took a few lines of code to create this stratified adverse event reports that also includes the system organ class of each of the AEs and that's very powerful. So if you haven't had to do something like this, you can say like, okay, well, it's a table. How hard could it be? But I guarantee you it's like pretty annoying to create these and this package is wonderful. So stay tuned. I believe it's on Thursday, Shannon, you wanna correct me that we are gonna hear from her on this package, all the details. We're nearing the end. So we're gonna talk about the themes. I love themes so much. So themes are a great way to set a ton of defaults for a GT summary function. So this includes like setting default arguments for the functions that you're using. This also allows you to change like really fine-grained things that aren't even available for arguments or helper functions within GT summaries. So and even do it in just one simple call and we'll go through a few examples, one simple call and all of the defaults change just like that. So here's our default TBL regression summary of our religious regression, which you've already seen. So that's, you know, you've seen it. I think it's cute. I hope you think it's cute. I like it. I hope you like it. But let's see what happens when we start applying a couple of different themes. So let's make a few notes here. We're looking at an odds ratio. We have a separate column for the 95% confidence interval. The upper and lower bounds are separated by a comma. Small p-values are shown to three decimal places and larger p-values are shown to one decimal place. There is a theme called Theme GT Summary Journal and it has JAMA, New England Journal, Lancet, Quarterly Economics or something like that. There's quite a few themes in there. And so what this does is right here at the top you are going to, oops, you're going to set your theme and you say, I want these to look like JAMA tables. So what it's going to do is going to go through and set all the defaults to be just how JAMA would expect you to record your table. So all we're doing here is exponentiating our confidence intervals, excuse me, our odds ratios and then putting a title or a caption on the table saying this is our JAMA theme. And you can see over here, do that again. So now we have our JAMA theme. You can see that little title up here. We now have a single column for odds ratio and confidence interval. And rather than having a comma separate the upper and lower bounds of the confidence interval the word two is the separator here. You can also see that for small p-values it's still rounded to three places but for large p-values, JAMA prefers two decimal places. So you'll see that here. And that will change for every single p-value across the entire package of GT summary. So that's pretty fantastic. So one line of code and now you're perfectly ready to go to JAMA or Lancet or New England. There's also a theme language. So if you are presenting your results in a language that is not English, there's a good chance that your language is supported. So right here up on top, you just set the language thing. Now this is going to do traditional characters, Chinese characters and it's the exact same TBL regression that we saw previously. But instead of saying characteristic it has it in Mandarin or Chinese characters, traditional characters. The person who helped us with the translations for the traditional characters said that they would like to see O-R, stay as O-R because that's how it's almost always presented. And so that's what we did because we just trust our collaborators. And these are the what 16 languages that are currently supported, full transparency. Dutch is currently in a pull request and only on the Dev version. So there's also this compact theme. So you'll notice from the previous theme the padding of each row, it's kind of large. So here you can set a compact theme. So while a single table I think looks really nice without large padding, when I have a report that has multiple summaries in it, I almost exclusively need the compact theme. Otherwise my report just gets huge. Especially when I'm going to like a word output for example and a single short table can seriously just take up the entire table, the entire page. So you definitely want to use compact theme when you have kind of larger reports that you're creating. So Eva has a question. How can I increase the font size of a cheat summary table in the shared slide? Well, you could actually use a theme Eva. You could set a theme to change the default font size of a returned TBL summary object. Or a TBL summary, any GT reported object actually. So that's one way to do it. So you can also set custom themes with the set GT summary theme, it accepts a list. There's an entire bin yet here and linked here. So there's a lot, a lot of things you can control. So check it out. I have my own personal theme that I use for me and my team and it works really well. I really enjoy them. Okay, print engines, home stretch here people. So you have seen all of this happening with GT as the output or the print engine. GT makes gorgeous tables, I love them. But up until recently, GT didn't have wonderful support for PDF, RTF and Word. So like I mentioned earlier, Word is currently in the dev version only of GT. PDF and RTF, I have them indicated here as under construction, but I recently had a conversation with the maintainer of GT. He said, well, they're actually done. Where are you waiting for? And I was like, oh, I'm waiting for the indentation to work properly. He was like, no, this is how you do it. So I need to update my stop to support indentation for PDF and RTF. And once I do that, which I hope to do in the next release, then I can put happy faces for GT, for PDF and RTF as well. But at the moment, if you have FlexTable installed on your machine and you're writing a markdown document or a Quarto document that's exporting to Word, it is going to print with FlexTable by default. And so FlexTable is another table creation package with tons and tons of options for customization. And it has different features. GT has some features that FlexTable does not have advice versa as well. So really, you really have a lot of options here, depending on what your output format is. There's also a function called as HuxXLSX. So via HuxTable, you can export GT summary tables to Excel via that function. So cable extra, it's fantastic for PDF reporting, fantastic. And that work is largely due to what Shannon did over like maybe six months ago or less than six months ago. She really just took it to the next level. It was super, super fantastic. And cable, if you've used cable, the tables are fine. They don't indent, they don't have footnotes. It's just largely not as cute, but it goes for all the output format, which is really wonderful. And then lastly, you can convert any GT summary table just to a table. Sometimes you just need your results as a table so you can just do whatever you need to do with it. And that's also available to you. All right, so let's just take a quick example. So you saw here that GT uses this function as GT as its printer. So you can take any GT summary table, type that right into as GT. And then from there, you also have, now you no longer have a GT summary table, you have a GT table. So now you can use any of the 100 plus functions exported from the GT package to further style your table. So here you can set the exact pixel width that you want of your column labels. You can change the alignment, which you can also do in GT summary. But in this example, I'm just showing you how to change the alignment using GT functions. And you can do that for flex table, hux table, cable extra, all of them. It works fabulously well. So in closing, this is meant to be an iframe showing the GT summary website, but it's not showing it. So I'm just gonna quickly open it in another tab. I'm just really realizing now I was not on full screen. I don't know for how long. Anyway, the GT summary website has a ton of information and lots of articles and examples. And I really encourage you to review it. There's a good chance that a lot of the stuff and the questions you encounter are already answered in the documentation there. The package, I feel like it's pretty widely used. Like when we started writing it, we started writing it just for us and our colleagues. And we never dreamed that there'd be hundreds of thousands of people using it across the world. So I think that you can confidently use this package and be confident in the results being accurate because of the large user base. There's also a function, a package that I maintain on the side called Peace Fund. It's not on CRAN, it's just on my GitHub and it has a website on my website. And it's kind of a bit of a sandbox for GT summary. A lot of the new functionality that we added to GT summary starts in Peace Fund, then we export to GT summary later. So here's a couple of examples of some cute stuff you can do. All right, and that's it. We finished with three minutes left. So if you have questions, be sure to check out the documentation. If it's not answered there, go to Stack Overflow. There's already hundreds of questions answered there. If it's not answered, just post a question, use the GT summary tag, me and the other authors will be listed and yeah, so we'll all be there. All right, do we have any questions? I saw someone make a comment that some of the slides, there were 73 slides here, but somewhere else we saw 77 and that is true. There is a slide missing that had all of the authors of the GT summary package listed and it is missing. I'm not sure what happened there. So Jonathan Rubins asking, what would the code be to export to Excel? The function is called asHuxXLSX, so asHuxXL and that will convert your GT summary table into an Excel file. And there was a question about resizing the tables. Would you suggest looking more at the print engines like GT or FlexTable to kind of specify those container heights and that kind of thing? So are we talking specifically within Sherrington? Was that the stuff we're talking about? That was one of them, yes. So it gets kind of tricky, but I think, yeah, you really need to depend on the print engines there because we're just gonna print default GT and default FlexTable, but you can definitely use additional functions in there to modify the size of what is produced. And in presentations in the past, I've actually used GT Save to save an image of the tables and then I just insert the image and then resize the image as needed. There's a lot of different ways you can accomplish resizing. Well, I really enjoyed getting the chance to speak to you all. I can't believe that was three hours that flew by. I didn't think I was gonna have enough material for three hours actually. Turns out I could talk a lot, which is good in the sense, in this situation. And if we don't have any other questions, I think we can, oh, here's a question. I often pipe the GT summary commands to get more formatting. Can that all be in a theme? Yeah, Jonathan Rubin, yes, that can all be in a theme. You can add all of that to a theme. Oh, and Raymond, is the word can we buy your hex sticker? I only have a handful of them. I guess I have to get them printed and make an Etsy account, set up a Shopify. I don't know, at the moment you cannot buy one. Oh, Shannon suggests, teachers, I like it, I like it. All right, thank you everyone. It was an absolute pleasure and feel free to reach out to me on Twitter or Stack Overflow with any questions. So I'm here for you. Take care, bye-bye.