 Hey folks, in today's episode of Code Club, I want to try something just a little bit different. What I'd like to do is share with you my thought process as I work through a problem that I'm having in data visualization. I want to kind of think about how I want the data to appear, and then I want to figure out how to use R to create that visual. The problem I'm working on relates to a video series that I did back back at the beginning of this year, looking at the effects of rarefaction on alpha and beta diversity metrics. In this figure that I'm showing you is the current status of my figure. I know it has a lot going on. I have 12 different data sets that I'm looking at. There's 12 different colors here. Don't worry, that's not the way it's going to end up. There's four different metrics, right? So there's SOBs, which is the observed richness, Shannon, the Shannon diversity, Bray Curtis distances between pairs of samples, and jacquard, which is another metric of diversity. Bray Curtis uses abundance, jacquard doesn't use abundance. What you're looking at across the x-axis is the number of sequences per sample. These data sets have, I don't know, between like a dozen and 400 different samples, they vary in the number of sequences. And the question I wanted to answer with this figure is if I use different levels of sampling depth to rarefy my data to, what does that do to the metric that I'm measuring, right? So in other words, if I get 10,000 sequences from each of my samples, what would I expect the average richness, diversity, Bray Curtis distance, and jacquard distance between the different samples? And so the mean would be, again, at 10,000 sequences, what is the average Shannon diversity? The coefficient of variation then would be what is the standard deviation between those 490 samples divided by the mean at 10,000 sequences? One of the things that I'm not a fan of with this is that I don't have labels on the y-axis, right? So the y-axis for the left side and the right side are different. The left is the mean, the right is the coefficient of variation. Each row then is a different metric, whether that's richness, Shannon, Bray, or jacquard. And so on the y-axis on the left over here, what I'd like to have would be like some indication that that's the mean of that metric. And then here, for the right side, would be a y-axis indicating that that's the coefficient of variation. That seems a bit busy because I'm going to have effectively the same y-axis for all four rows of this data frame. I don't know that that's so great, right? And so what I want to work through with you are a variety of strategies that I might use to figure this out. One strategy that I might try instead of having it vertical would be to have it be horizontal. The top row being the mean across all of the samples. The second row being the coefficient of variation. And then I have four columns for each of the four different metrics. I might drop the jacquard because it's really not that different than the Bray. If anything, it is a little bit larger value. But the trends are like identical. You have to kind of stare at it closely to see if there's actually a difference between the Bray and the jacquard. So that's what I want to do with you is kind of show you how I would work through a data visualization problem like this, because I think that's really helpful to see someone's insights and how they think through a problem. And so that's what I'm going to be trying to emphasize as I do this. You'll notice that I'm not in our studio. I'm in virtual studio code. On the left side, I have my text files, my code, as well as the PDF version of the figure I'm trying to make. On the right side, then I have my terminal. That's really enough to know for today's episode. So there's a lot going on in this script that really isn't relevant for the purposes of our analysis. Again, what I'm trying to do is think about how we would visualize the data and create the visual. So I'm going to go ahead and load all of this code down to my line 84 here. So we'll see that that makes a data frame called alpha composite, which has the data set, the method, the method being the statistic, the number of sequences, the standard deviation, the number of samples, the statistic, and the value, whether that's the mean or the coefficient of variation. I also have beta composite, which is the same thing, but with the beta diversity metrics like Bray Curtis and Jakard. In my alpha beta, then I go ahead and I bind those rows, those two data frames together, along with a dummy data frame that I made here. Don't worry about that so much. But again, this then combines the two data frames together. And so you'll see that the one with the alpha diversity metrics had 2,596 rows. This had 1648. And then my dummy one had a couple extra rows. And so you'll see that this is the combined of those three data frames. So again, thinking about the code here, you'll see that I bind those two data frames together. I run it into ggplot, we're on the x axis, as you saw, I used and seeks the y axis is the value, again, that being the mean of the metric or the coefficient of variation. I then group those, the data by the data set. So I'm creating a line for each data set. And then each data set gets a different color. And again, I'm making that line with GM wine. And then I'm using facet wrap to, in this case, create a bunch of different facets that are combinations of the different methods and the different statistics. So if I were to take my my bind rows, and then I then did a count on method and statistic, I would see that there's eight different combinations here, right? So there's the observed richness, Shannon, Bray, Jacquard, and then the mean and coefficient of variation for each, as well as the number of observations I have for each. So the numbers for the beta diversity are a bit different than for the alpha diversity. And that's because I'm building this figure before I actually have all the data. A couple of the data sets aren't done processing. But I've got enough of the data sets here that I can kind of push forward, instead of just kind of waiting a few days for those data to be generated. All right, then I go ahead and make my y axis continuous going from zero up to whatever it tells me, this makes it so that the minimum value on my y axis is zero. I have a scale x log 10, because the number of sequences in my different data sets can vary considerably. You know, here we can see it's, you know, a hundredfold, right? And so I'm going to put this on a log scale. And then I'm using markdown along with gg text. And down here, element markdown to stylize this exponential notation to look like 10 to the third, right? And then I've got chord cartesian clip off. I'm not totally sure why I have this. Maybe I'll go ahead and remove that for now. And then I have my x axis label of number of sequences per sample. And on the y axis, I have nothing. Again, that's because I've got different things in the two columns, right? So the left was the mean of the metric, and then the coefficient of the metric coefficient variation of the metric. And then I've got a bunch of theming. Okay. So again, as I showed you, this with facet wrap makes this figure. So I'd like to basically transpose my visual. And so maybe what I'll do is method plus statistic, tilde, and then n call equals four. And I'm getting an error. And I think that's because maybe it wants a period here. Go ahead and try that again. So now what I've got is I've got the first row, and then the second row is up in the upper right, right? So it's basically doing it road wise. I think if I come over and I look at facet wrap, I think there's a way instead of doing it left to right to do top to bottom. And let's see, so there's dirt equals H, I think that's the argument I want. And so direction either horizontal to for the default or V for the vertical, let's go ahead and do dirt in here. equals V. Now that I have these two rows, I kind of like this appearance, because I can easily kind of scan up and down to look at the Shannon and kind of see yeah, that is starting to level off. And that the coefficient of variation is quite low compared to the other metrics. And it's also trailing off. I kind of like this appearance. My labels are messed up, right? So one thing I could do is put my labels to the left. So I could go ahead and do strip dot position equals left. And that should then put my labels on the left side of the window. And so of course, what you see is that it puts all of the labels on the left side, what I'd rather have is just one label over here on that side, and a label across the different columns. So I think this is a challenge that I have because I'm using facet wrap, let's try facet grid instead. So we'll do facet grid. And I'll go ahead and remove that dirt equals V. And for now, let's go ahead and remove that strip position. I also don't need this and call equals four. And I will then go ahead and do met method tilde statistic. And that will put my method, I think in the y axis and the statistic on the x, maybe the opposite, whatever, I will run it. And if it's wrong, I'll flip it. And sure enough, I did need to flip it. So let's go back. Sometimes it's easier for me to debug than to memorize, right? So in a case like that, it's not a big deal. If I don't remember it, sometimes it's just faster to rerun it and then flip it. So now I've got my four metrics across the different columns. And I've got my mean and my coefficient of variation. I do have scales equals free y there. And so the difference between facet grid and facet wrap with scales equals free y is that with facet wrap, it'll change the y axis on each panel separately. Whereas with facet grid, it does, it changes the y for all of the columns in the same row, right? So what we see is that the mean goes up to, you know, 33,000 or something for all four values, even though Bray and Jacquard never gets above like 0.6, right? And so those look like flat lines. So that's not so cool. So it's not such a problem down here for the coefficient of variation, because it's putting them all on the same scale. One of the things that I could do with this is I could move these labels that I have mean and coefficient of variation to the left side of the figure. And I could, I could work with this to make it look pretty attractive. But the problem, again, that I have is that these four metrics, their mean are all on four different scales. So that's not going to work. So I think what I'm going to do instead seems a bit drastic, but I'm going to make eight different figures. And then I'm going to try to use patchwork to pull them all together. Okay, so I'm not totally sold that that's going to work. So I'm not going to delete this just yet. Maybe what I'll do is I'll go ahead and copy this down. So I'll go ahead and comment this out so I can always come back to it later. I know that that works for the last figure that I generated. So I'm going to go ahead and grab this bound rows. And I'm going to call this composite. All right. And again, if I take composite, I will then go ahead and do a filter where I will do method equals equals sobs. And metric or statistic equals equals mean, again, running these two lines, I see that I've got my my method sobs and my statistic mean, that's good. I can go ahead and pipe this into ggplot aesx will be and seeks actually, I'm just going to grab what I had before, because that would be easier. Again, what I'm trying to do is make one plot that I can then replicate a bunch of times, right? And so then I've got and seeks the value of the data set good, geom line. I don't need this facet grid. I'm going to leave that scale y continuous and scale x log 10. Go ahead and get rid of that. Maybe I'll go ahead and comment out the labels. And let's go ahead and leave the theme stuff there. I'll go ahead then and save this as alpha beta. So the alpha beta allows me to save the plot image. And then down here in gg save, I'm going to save alpha beta to that plot. Again, I'm just trying to get a rough idea of what the figure looks like at this point. Good. So this is my plot for richness for the mean. Great. So if I wanted to get the coefficient of variation, I could then change that statistic for mean to cov run this again. And now I've got the coefficient of variation for the richness value. Cool. So we can hopefully see how I could repeat this eight times. But of course, that wouldn't be dry. So I'm going to go ahead and turn this into a function where I will then say plot method statistic function. Maybe I'll call it m and s for the method and statistic. And then in here I can put method equals equals method, statistic equals equals s. I need to wrap all this in curly braces to define the body of the function, right? And so I'll go ahead and bump this over and this back. I also need to pass it the data. So I'm realizing I've got composite here, but I never defined it in the body of the function. So I'll go ahead and call this composite. All right. And I'll go ahead and bring this down. And let's go ahead and load that. And so now what I should be able to do would be like plot method statistic composite. And then again, for my method, I can go ahead and I'll say sobs, and then mean, and let's go ahead and call this alpha beta. And we got right back what we were hoping for. Cool. So I can then maybe instead of alpha beta, I'll call this sobs mean. And I will repeat this a few times. And so I will do sobs COV. And then I'll replace this mean with COV. And I'll make this Shannon. And then I'll replace that with Shannon. And so then in here, I will put Shannon instead of sobs. And then down here, COV, COV. And let me run these four. And I'm going to put the Shannon mean in here. And so now I see I've got the Shannon mean plot. Good. So we're halfway there. Let's keep going. And I will copy this down and add in the break artist and jacquard. And I'll put in the jacquard COV here, run everything. And there's what the jacquard coefficient variation looks like. Cool. So now I've got my eight different panels. And what I'd like to do now is to tie them all together with patchwork. So back up here at the top, I guess I already had patchwork installed or loaded. So I can come back down here. And I will then call this patch. And let me do sobs mean. And again, across the top, I want the means, right? And then I'll do Shannon mean, Bray mean, and I forgot an e here. And then jacquard mean, right? So we'll go ahead and load that. And then we'll put the patch in here. That's what we're going to save. And so it's putting the four plots together, of course. But it's not putting them in one row. And so we can define that here in a minute. But let's go ahead then, and define this as a group in parentheses. I think I can do that. And then I can do divide by, and then in parentheses, again, I'm going to put all this stuff. But instead of the mean, I want the cov. So I'll copy that and just paste it over the rest of these means on that row, and rerun that. Alright, so what it's doing is it's putting the four means at the top, and then it's putting the four coefficient of variation on the bottom. So I forget how patchwork works the syntax for it, I basically have to remind myself how it works every time I go and make figures for a paper. So I'll do our patchwork. And let's do controlling layout. So I'm basically, yeah, I'm looking for how to control the layouts. And so I'll go ahead and scan down here, looking for something that looks like what I want it to look like, right? And I see that I can control the grid by adding plot layout and call equals three, I'll come back to my code here. And I will then do plot layout and call equals four. And so we'll go ahead and rerun this. So these the eight different legends are really getting in the way. I think if I look back over here, I saw something controlling guides. So I'll come down to that. I can see that within plot layout, I can do guides equals collect. So again, if I come back to my code, and then in here with plot layout, I can do guides equals collect. So let's give this another go. For some reason, that's giving me two guides. I'm also getting the means on the left side and the coefficient of variation on the right side. That seems weird, right? So let's come back to here and see if I define something a little off. Maybe what I'll do is go ahead and remove that division and make it basically straight up addition without needing to worry about the division sign. So let's try that. I'm not sure if that'll work, but we can try. So now I've got the four means across the top, and the four coefficients of variation across the bottom row. I still have the two different legends. I think that's because my beta diversity metrics, the brand, the card, I don't have the rice or the stream data sets that I have for the alpha diversity data sets. So I'm not going to worry about that for now, because when I go back and make the final version of the figure, that should all get cleaned up. All right. So what I want to do now is deal with these labels on the x axis and the y axis. And I want the y axis label for the richness mean and the richness coefficient of variation, but I want to remove them from everything else. So when I'll come back in here and I remember I commented out this labs function and uncomment that. And then on x, I'm going to make that null. And I will also make the y axis null. So let's maybe run this and see what it looks like. So now we've gotten rid of the x and y axis labels. Good. Then what I can do is maybe come down here to sob's mean, and then do labs. And then on y, I'll do mean value. And then yep, they'll stay there. And then for coefficient of variation, I'll do labs, y equals coefficient of variation. And that's a percent. And I misspelled this. There you go. Okay, so let's go ahead and run this. It's good. So now I have the mean value and the coefficient of variation. Or maybe I could say mean across samples. Mean value across samples. Okay, good. I'm happy with that. I would like to get a common x axis though. So if I come back to patchwork, let me see if there's anything else in the guides, maybe adding annotation to a patchwork, that could do it, right? So title, subtitles and captions, tags would be for each of the individual plots. More on tags. Looking here. Not really seeing it. Maybe what I could do is to trick it and do a plot annotation of a caption. So the caption goes at the bottom. So let's try that. So I'll come back over here. And then I can then do plot annotation. And then instead of title, I'll do caption. And then I'll do number of sequences sampled. Like that. I know that the theming is going to be wrong, but we'll roll with this for now. Good. So now I've got number of sequences sampled in the lower right corner. I want to add some theming, right? And so what we can then do is add to plot annotation. As we saw back here, the theme argument, and I'll plop that in there and I probably have too many. Nope, I think I got the right number of parentheses. So theme plot caption equals and we'll do yeah, size equals 18. I'm not sure how big that is, but we'll we'll get the size and then I'll adjust accordingly. And I'll do H just equals 0.5. That'll be center justified. Go ahead and run that. So that looks pretty good. It's centered on the whole figure, rather than centered on the plot, though. So maybe I want to try to bump it over a little bit. So if we do like 0.3, I wonder what that'll get me. So that's maybe a little bit too far to the left. So let's maybe bring it over a bit. So let's do 0.4. That looks good. I like that. Maybe make it a little bit smaller. Instead of 18, let's maybe go down to 14 and give that a shot. I think that looks like a good size. One other thing that I'm noticing is that the aspect ratio of the individual panels is off. It's still set up for my vertical dimension of the figure. So let's go ahead and change the output. So I have width of 7, height of 8. Let's go ahead and maybe make that 5. So I think that looks better. Again, these legends are still kind of messing me up. I'm not going to worry about them so much. But I would like to put a title across the four different columns. And so maybe I could do that with the title function for each of the individual plots, right? And so what we could do up here then, where I added labs for the SOBS or the observed richness, I could go ahead and put in title equals richness. And then I could, I'm going to skip the coefficient of variation. But then for Shannon, I'll go ahead and do labs, title equals Shannon. All right. And then down here for the Bray Curtis, I'll do labs, title equals Bray Curtis. And then for the jacquard, I'll add that labs, title equals jacquard. All right, so let's run this and see what it looks like. Because it's really a title, the font is pretty big. It's also left justified and I misspelled Shannon. So let me fix that first. So get rid of that extra O. I can change the theming for one element across all the plots. I think I saw that down here, right? So with this ampersand, I can add theme element text mono to all of the plots, right? So I will then do that down here. All right. Again, I don't have everything memorized, right? I use the documentation that I can find. And I find things that look like what I want to do, and then bring it over and adjust, right? Again, that's the whole idea of Riffamonus is that we can take ideas that other people had other solutions that people have had found, and then adapt it to my purposes, right? So then I can do plot.title element text. Let's do size equals 14. That sounds good. So I think that made it bigger. Let's go down to size equals 10. So it looks better. Maybe I'll make it bold and centered now. And so we'll do face equals bold, h just equals 0.5. I think this looks pretty good. I'm not totally liking the kind of longer nature of each of the panels. I could, again, reduce the height a little bit. I could also get rid of the jacquard. And I'm kind of inclined to do that, because the jacquard just doesn't look that much different than the Bray Curtis. And getting rid of the jacquard would, I think, help to clean up the appearance a bit more. So I could do that up here with commenting out jacquard, right? So comment that out. And then I'll go ahead and remove jacquard cov here and jacquard mean there. And then I need to revise this to be three columns. So let's go ahead and run all this. So I think that does look a little bit cleaner. I don't really need that extra jacquard column. Maybe what I'll do is go ahead and make this mean value across samples be two lines. Let's see, where did I do that? So mean value across samples here, I can go ahead and put in a backslash and there, go ahead and rerun this. And maybe I'll do the same thing with coefficient of variation, have that go over two lines with again, another backslash and so I think that looks pretty good. There's a bit more that I need to do with this figure in terms of the legend, as I already mentioned, also the different colors. This has gone on long enough for today's episode. Hopefully, again, you get a sense of the evolution of this figure and how I tried to use things like facet and wrap facet grid, but ultimately wasn't able to really do what I wanted to do with those. And so instead, pursued this approach with patchwork. Hopefully, nobody comes along and says, Hey, Pat, there was this really simple argument you could have used with one of those facet functions to do the same thing. But I think this looks pretty good. And I'm happy with the way this looks. Of course, like I said, once I get in the rice and the stream data, these two legends will collapse down to one. And I think that'll look pretty good. And if you want to see the final version of this figure, well, you'll just have to look for that preprint to be published or posted later in the summer. So anyway, I'm really happy with this. Let me know what you think of this format of episode. It's a bit different than what I've done in the past. Again, showing you real data visualization that I'm working with for my own personal research, and helping you to see my thought process and the types of questions that I'm trying to consider. And I've got ideas in my head that I then need to translate to code to then get the visual to look like I want it to look. So let me know what you think, again, of this type of episode. And I'm sure there's more tweaking I'll do to this figure. But anyway, let me know what you think. And we'll see you next time for another episode of Code Club.