 Hi everyone, my name is Kat Hoffman. Today we'll be discussing making swimmer plots for longitudinal data using ggplot. Little bit about me. I am a research biostatistician at Wild Coral Medicine in New York City. I primarily communicate via blogging on KH stats, but I love to hear feedback on blog posts, so feel free to contact me via any of the modes listed below. So what is a swimmer plot? A swimmer plot is a graphical way to show a subject or a patient's profile over time. It's comprised of a series of horizontal lines in which each line represents one subject. And the colors of the shapes on the line usually indicate treatments or some other type of status at that subject at a particular time. This example on the right is a swimmer plot taken from cell. It's on COVID patients. It's showing the composition of the patients or demographics and the circles on the plot are indicating when the patient was sampled for proteomics. And so swimmer plots can be used in manuscripts, but they can also just be really great for exploratory data analysis or for quick presentations to show different aspects of some type of study cohort. You might be thinking, do we really have to hand code a swimmer plot? It looks sort of complicated. And the answer is that swimmer plot packages such as swimplot do exist, but it's generally just more customizable to write the N10 GG plot code yourself because these packages ultimately just output a GG plot object. For today's talk, we're going to be remaking this swimmer plot that's shown on the right, which is a swimmer plot or a treatment timeline for 30 patients. It's a cohort of hospitalized COVID patients and we're going to use the swimmer plot to show the timing of severe hypoxia, intubation if they were intubated and steroids administration, as well as 28 day mortality. First things first, we will read in a data set. And the data set that you'll probably want to start with is a pretty standard long form data set. It contains an ID column, some sort of time column, columns for each status that you're interested in denoting ultimately on your swimmer plot. So maybe it's a drug or time sampling or meeting other criteria. And you're going to have one row per subject, per unit of time that you're interested in, which is pretty standard format for longitudinal data. We can see for our example plot, there's a data set on my GitHub that contains one ID per patient. One row per day of patient is in the study all the way up to 28, which is since we're looking at 28 day mortality. We have then four columns for the four statuses that we're interested in, which is whether an occupation was intubated, whether or not they received steroids, zero if they were receiving steroids that day and one if they were. Same idea for death and same idea for severe, which is the day that they met criteria for severe hypoxia. It's an indicator variable at that day. So you would probably have a standard data set that looks like this or something like that. And you'll need to get it ready for plotting for your swimmer plot. And there's two main modifications that you'll need to make to make your data ready for plotting. And that is changes to the ID column and the status columns. The ID column change is optional, but I like to refactor or reorder my ID column by the length of time that my subjects are in my study of interest. So that it's a little easier on the eye to just look down and see like a nice descending order of length of stay or length of study time. So we can look really quickly at how we would do that. We can take our long form data set, we can group by ID, make a maximum day variable and then use factor reorder and reorder our IDs by the maximum day in the study. We'll have the same time column, but for our status column, we now want to make a status column that is the time that the status occurred, assuming you want the status to appear on the plot. So instead of it just being an indicator variable for status, we're going to make it be the time if the status occurred and you want it shown on the plot. And we're still going to have one row per subject per time, but let's look at how we can modify that status variable. So we'll take our reordered ID data set and we're going to use the statement case one from Duplier, which basically says if we're going to make a new variable called like the status underscore this underscore day and that status is going to be if the patient met the status that day, then it's the day that the status occurred. So instead of it being an indicator, we're now going to have it be day. And the default for case one, if you don't give a statement that is met logically, is just to be NA, which is actually perfect. That's exactly what we want. So for our severity status, steroids and death, we're going to do this case one statement to make new status columns to be able to show on our GG plot. And so we can see what the data set looks like. We still have our ID day column, but now we have these new columns called severe this day, for example, and this first patient became severe, severely hypoxic on day zero, one, two. And so at that day, instead of it being an indicator variable, we now have a day variable, which is two. And it's NA otherwise. They never received steroids, but they did die on day 16. And so instead of being an indicator variable, it's now 16. You can look and see that a different patient received steroids, for example, on days 13, 14, 15, and 16. So this is what we're going to save and start to plot in our GG plot. So for GG plot, we are going to high-press from our data and then set a global aesthetic for our GG plot that is the ID. And ID is going to be mapped to both our Y aesthetic as well as the group aesthetic. And you'll see why it's basically to make all the lines stay together with the ID. So this is our reordered ID variable. We're just going to take away that gray background with theme underscore BW. And we're going to add our first geometry, which is a geome underscore line. And to geome line, we're going to additionally map that the x-axis should be the day column. And when that's just the day that they're in the study. And the color is going to be mapped to intubation status. And so we can see now we have a line for every patient ordered by length of stay and their intubation, the line is colored by their intubation status. So that's a great start to the timeline. We can then add the day that each patient receives steroids. So steroids underscore this day. We can map that to geome point aesthetic and then it'll just be these default, little black circles as a shape. We'll do the same thing for the day that they met severe hypoxia criteria. And the same thing for the day that they died. And of course this plot is not very useful right now because it's just a bunch of black circles and no way to really distinguish them. So let's modify those geome layers. And to modify those, let's start over. So we'll go back to sort of our blank slate ggplot. And this time when we add the geome layers, we're going to start adding some characteristics of the geometries outside of the aesthetics argument. So here I've added, I've bumped up the size for the intubation status. I'm also going to change the stroke and the shape to be squares for my steroids indicators. I'll do a similar thing for severity of hypoxia. I'll make them circles and I'll make the death, little checker, the hexes. And so now we need to have some sort of a way to be able to distinguish these different colors probably. That's a good next step. And you'll notice that we also, since we added all of these characteristics outside of the aesthetics argument, we're not getting a legend that corresponds to any of these shapes right now or different stroke types. So when we add the colors, you're going to see that we're also going to do this in a way that gives us a legend for other symbols on the graph. So this is sort of the most important step, I would say, to be able to make nice swimmer plots with juju plot. And that is that understanding how you can add colors and the legend at the same time. So bear with me, the big picture is that we want a legend that is going to correctly map every characteristic of our statuses to like into the legend, like actually show it. And to do that, we need all of our statuses if we show up in the legend, which they currently don't because they're not being mapped to aesthetics. We're going to force all the statuses to show up by creating a new aesthetic, which is going to be color that corresponds to the name we want each status to be labeled in the legend. We'll then be able to modify this legend just for color to contain information about the shape, line type, et cetera. You'll be able to see this all in one second, but sort of keep in mind that that's the big picture is we need to force over juju plots, I guess natural tendencies of what it wants to show. So we're going to do that by taking, we'll restart with our gem point as geometries. And when we add them, we're now going to add this argument for color in the aesthetic and that this parameter that we're going to put in is going to match what we want the symbol. So like for this one, it's steroids. What we want that symbol to show up as in our juju plot legend. So we have this variable steroids this day. We want it to be called steroids. And you'll see that when we add it, we toggle back and forth. It gets added here now steroids into this legend, which now describes the colors on our graph. We'll do the same thing for hypoxia. Say we want it to be called severe hypoxia. We see it shows up on the graph. And for death, same thing. It's now in the graph. And one thing to note is obviously these are just default colors for juju plots. So we're going to figure out how to change those in the next step. Also, this is clearly just a legend for color. It's not incorporating any of the shapes that are happening right now. And this legend is in alphabetical order, which may not be actually what we want it to be in like in our final plot. Like I don't want death to be shown first because that's the thing that happens last chronologically. So we're going to show in the next step how to change all of that. So it's time now to modify colors. And to modify the colors, we're going to make a key that corresponds to the syntax that this layer that you can add to juju plot, scale, color, manual, what that takes in. So the key for that is going to be a vector and it's going to be a labeled vector. And the labels are going to match the colors that we just assigned in the previous slide. So I said the label for the legend was severe hypoxia. I'm going to put that there. I'm going to match it exactly same character strings as I have in the color's aesthetic. Those are going to be the names of my vector and the values in my vector are going to be the colors that I want to appear. And that's just the syntax that scale, color, manual takes for values, the values argument. And oh no, the order of this color key needs to be the same as the order of the statuses appear in your legend or as you want them to appear in your legend. So as I said, I didn't want death to occur first in the legend, even though it's alphabetically first. So I put that last here. And we're going to put this calls key that we just made as the values argument of scale, color, manual. So let's do that now. So we have our same plot as before. And you can see we're just really one line of code adding scale, color, manual, values equals our color key and the name equal to the name you want the legend title to be. So patient status right here. We can really do a lot with that just one line of code. So we can toggle back and forth. You can see we got all of the colors that we wanted and the order of the legend that we wanted corresponding to the labels that we want. And yeah, our plot is now looking in much better shape. The last thing that we need to do to this plot is you'll notice that it's still just a legend for color. And it's not reflecting like, oh, I have a black X for death, not just a black line with a square on it. So we need to modify this legend using override AES. And override AES stands for override aesthetic and that can be used, well, you'll see but it's in one of the guides layers that you can add for Gigi plot. But to do this, to use override aesthetic we need to create vectors for overriding everything and it's a shape, line, type, stroke and size for this plot because I changed all of those things in the Gion point layers. So these override vectors need to be in the same order as the color key that we made, the calls key. And as a reminder, the calls key looks like this. So it's severe voxia, like all the names and then all the colors that they correspond to. And we'll just make vectors that go in that same order. So for shape override, if you remember for the Gion points we said that hypoxia should be a shape of 21 because we wanted it to be a circle. We don't want a shape to show up for intubated and non-intubated in our legends. We're gonna use NA. We have a shape of 15 and four for steroids in death respectively. Similar idea for all of the other characteristics. We don't wanna line to show up for our status indicators but we do want them to show up for intubation. So we'll put ones there and NA's for the other. So now we'll take these four vectors, these override vectors and this is the same plot as before. I'm just going to add this sort of block of code for the guides layer of dgplot. And we'll say, this basically just says take the color guides and color legend and override the aesthetic with this list that I've given you. So my list is, hey, for stroke, I want you to override the legend and use my stroke override arguments, parameters, column. And for size, same thing, shape line type. And as we toggle back and forth, you can see over here the legend is removing all of the extra, I guess, stuff that we don't want to show in the legend and yet customizing it to be our own. So at this point, we really have a plot that's pretty close to being what I originally showed you in the first slide. So we can take just a couple of minor edits. I'll go through this really quick. These are very much your own personal preference but I add labels, I remove white space using scale X continuous, do a bunch of things to the fonts and the justification, move the legend inside and then I remove these last lines although definitely personal preference but right here is the final plot that I showed you at the beginning of this slide deck. So in summary, long format plotting data needs to have one column per status per row that indicates the time of the status marker if you want it to be denoted. And once you've got your data properly formatted like that Gigi plot can be used to make very customizable sort of lots with the GM line and GM point layers. The legend can be properly configured using override aesthetic from in the guides layer and these ideas can be extended in many ways such as showing patterns of missing data. This plot is showing patterns of missing data using transparency and that continuous covariate from a 2019 blog post but these slide deck corresponds to a 2022 blog post that I have. And yes, thank you all so much for listening and to the authors of sharing and flip book are for making these slides possible. Again, we'd love to hear from you all after this and we'll take any questions now. Thank you.