 I'm a big fan of the power of ggplot2 and all the wonderful things we can do to make attractive plots in R and make them really efficiently. But if you're also like me, you're a little bit intimidated by using the theme function to start changing the formatting and the appearance of your plots. Well, don't be afraid because today we're going to dig into that theme function and see how we can take a faceted plot where we've got labels on each of our facets and turn those facets into those y-axis labels and how we can make some other modifications to make our plots look publication worthy. Hey, everyone, I'm Pat Schloss and this is Code Club. In each episode of Code Club, I use principles of reproducible research and apply them to an interesting biological question. Recently, we've been doing that by looking at the sensitivity and specificity of Amplicon sequence variants, also called ASVs, or operational taxonomic units, also called O2s. Now, please don't run away with that jargon. It doesn't really matter because what I'm showing you are tricks and principles of again, reproducible research and how we can apply them. We're currently at a point in our project where we're building figures for a publication that I'd like to submit here in the next month. If you look at the last episode, I showed you a bit of my process for critiquing my figures and then making a list of things that I want to do to make them better. I don't spend a lot of time when I'm doing my exploratory data analysis making those figures look polished because many of those figures will never get published, so why polish them? Today, we're going to take a look at a second figure that actually uses panels generated from facet wrap. What I'd like to do though is use those facet labels as my y-axis label. And so to do that, we have to use a little bit of trickery along the way and we'll also see again another iteration of my process for making these publication quality figures. We'll go ahead now and dig into the project and I will move to my project root directory and open up my RStudio project. I'm also going to come to my GitHub repository, my repository on GitHub. It's a Git repository. And look at exploratory and the figure that I'd like to incorporate into my paper in some form is from this lumping and splitting analysis that we did. When did we do that? Back in November, end of November. And so we looked at, this was an episode I think when I did this, looking at facet wrap and facet grid. And I looked at lumping and splitting for each of the four regions. And so lumping was when we had two species that were merged together within an OTU or an ASV and splitting was when we had a single genome split apart into multiple bins for different OTU and ASV definitions. And so we looked at the data in slightly different ways. Here we plotted both lumping and splitting in each panel and we had four panels for each of the four regions. I didn't totally like this because lumping and splitting are kind of different fractions or fractions of different things. And so this next iteration I kind of liked, where again I had the fraction lumped, split, and then each line represented a different region. Here we kind of pulled it apart. And I kind of liked that. This was kind of transposing it. And so I kind of like this look, but I think I use up a lot of real estate by putting each region into its own column of the facet. But I do kind of like having the lump and split be separate rows in my front data frame. And if I come back to this plot, what I'd like to do is perhaps instead of putting them side by side is put them on top of each other. So why don't we do that? I'm going to come back to RStudio and create a new R script. And I will save this to get that saving step out of the way. And as you can see, I call my R scripts to generate figures plot underscore, and I'll call lump lump, split dot R. So you'll recall that a few episodes ago, we actually repeated that analysis, the lumping and splitting analysis. The exploratory analysis I did was for one iteration of grabbing one genome from each species. What we did previously was just a couple episodes ago was to repeat that 100 times so we could get a median or an average of those 100 iterations of randomly sampling genomes by species. And so again, if we look at data, and it was in process, we had lumped split rate dot TSV. So I'll go ahead and do library tidy verse, run that to get that loaded. And then I will do lump split. And that will be read TSV and read in as data processed lumped split rate TSV. And again, just to be explicit about my column specification, I'll do call types equals calls. And then I'll say region equals call character. And I will then do default equals call double. Okay, read that all in and then look at lump split. And again, we've got our region or threshold or split rate, our lump rate, and we're in good shape. I'm going to come back to my code here that I used to build the side by side. And that is this. So I'll go ahead and paste that in here. And I'm noticing that I had the data in a slightly different format before. And that was that I probably gathered things together. So yeah, I did this pivot longer, where I drew the things together. So why don't I go ahead and do that as well, so that we have kind of a common naming system. So I'll throw that in there. And yeah, I think that should be good. So again, that's kind of the problem when you're cobbling things together and copying and pasting. But we'll get it figured out, don't worry. So I think I'll remove that variable. And let's go ahead and then add the pivot longer to this. And so again, our columns aren't f split, but they are a split rate. And lump rate. And our names will go to method. And our values to will go fraction, and I'll leave I'll leave that. And maybe I'll do a select to get rid of things that ends end with ends with IQR, because again, those IQR values are so small that I just it's hard to get excited about them. So it needs to be in quotes. Great. So we've got that. And we're now ready to pipe this into building the GG plot. And so let's make sure that we get a plot that looks good. Yeah, so this is the side by side look. And again, we did facet wrap method, and I'm going to do n row equals two. And so that then puts it top the bottom. So we're in good shape, right? So I always like theme classic. To make it black and white, I'm now going to think about this plot and what I like about it and what I don't like about it. And what I'd like to do is to put the lump rate title over here, in place of fraction and split rate over here in place of fraction, effectively so that both of my plots have different y axis labels. Also my threshold is down here. Something that I remember from my text is that I've always been expressing these values in terms of a percentage. And so I need to change my threshold. So I will do scale x continuous. And we saw something like this in the last episode with scale y log 10, where I didn't like the thresholds. And so I'm going to put in breaks. And I'll do zero, 0.025, 0.05, 0.075 and 0.1. And then my labels, I'm going to make it a character vector with the actual values I want on there. So I'll do zero, 2.5, five, 7.5. And I've kind of screwed up my quotes here. Let's see 7.5 and then 10. And let's see. Do I have everything lined up? And I need quotes around that. That will look good and add that. And so now we should see, yeah, we've got the values there. And so I'll do labs and x, I'm going to do distance, and I'll put a percent. And then y, I'm going to do null. And that they, and I forgot my, this needs to be in quotes as a character. And I forgot my plus sign at the end of that line. But you see we've got distance, maybe I'll say distance threshold to define ASV, go to you, be a little bit more descriptive, right? So that looks nice. And I'm not totally thrilled with the name on my legend there, or the values that I have there. So another thing I can do would be scale color manual. And I can then say name equals null. So I don't want it to say region, because that's going to be obvious that it's the region. And I will then do breaks equals C, v19, v34, v4, v45. And I can then do colors. And I'll do C, let's do black, blue, green, and red. I know red and green are not the colors we want to wind up with. But we'll talk about colors maybe in the next episode. And so we'll revisit that later. And then I will do labels, see v1 to v9, v3 to v4, v4, and v4, v5. Okay. And put the plus sign on that colors. There should be values. Yeah. So then we get our black, blue, green, and red, for our different colors. And we then got rid of the legend title. The next thing we want to do is to modify the theme. And we talked about this a little bit in the last episode where we saw how we can get rid of the label for the strip. And what we want is this this rectangle is called the strip on each of the panels for doing the facet wrap from up here, right? And that I really want this label to be over here, parallel to my y axis. So we should be able to do that by modifying our facet wrap arguments. And I forget the exact argument. So let's do facet wrap. Nope, got to spell it right. So as we look at this documentation, I think the argument that I would like is the strip position equals top. And maybe I want to set it to left. So let's look down here at the description of strip position. By default, the labels are displayed on the top of the plot use strip position, it is possible to place the labels on either of the four sides by doing top bottom left right. So let's do left and see what happens with that. So let's do strip dot position equals left. Give that a shot. And voila, it's on the left side, right? But it's on the inside of the plot. I'd like it to be on the outside of the plot. And so we can then look at the theme arguments. And we've seen before that there's a bazillion of these. And the one that I'm thinking we want is strip placement. What we can do is strip placement equals outside. And so let's give that a run. And yeah, sure enough, it now puts them on the outside. Excellent. So what I don't like, of course, are the color, the colored border around the label. And we've seen that before as well, where we can do strip dot background. Yeah, I think it's background. And it is it says it takes element wrecked. And let's look at the arguments for element wrecked. All right. Yeah, so that will draw the borders and the background. So element blank draws nothing. Element wrecked controls the borders and the backgrounds. So element wrecked. I think what we want is color equals na. So let's give that a shot. I think that we'll get rid of Yeah, I got rid of the border on the facet. So look, hopefully, you can see that we now have yx labels for our two different panels. Now what we need to do is to modify those labels to be something a little bit more descriptive for what we're trying to get at. And so what we can do is if we come back up here where we've done facet wrap by method, and we set names to method up here. So let's look and see what the method labels are. And we've got lump rate, split rate. So what I want to do is modify those to change lump rate and split rate to something a bit more descriptive, that's going to then show up in my y axis label. So I'll do mutate method equals. And let's do if else. So if else if it's lump rate, then I'm going to do what I'm going to say. Fraction of fraction of species merged together. And then else, so that will be then the split rate fraction of genomes split apart. Okay. And so now, ah, it's not happy. Why isn't it happy? Oh, because no, let's see. Oh, because I needed to say if method equals equals lump rate than do that. And so sure enough, now we get the nice labels. And we will then pipe that into ggplot. And we'll then see our labels. So of course, this looks like garbage. And at this point, it would be good to again output it with gg save to set our actual dimensions. So we'll do gg save. And we'll do figures forward slash lump split dot tiff. And let's do let's see what width or let's do height. And I remember from last time nine inches was the maximum height and width or no, sorry. Yeah, and I think the maximum width was six point something. So I'm going to make this like, let's do width equals three and height equals five. And I'll also make a PDF version of this. So let's go ahead and source this and see what we get. And I will come back and open up some figures, lump split and see what these look like. So this is the tiff. That actually doesn't look horrible. And this is the PDF version, which also looks pretty reasonable. The y the x axis label needs to be maybe cut in half, so to speak. I'm also noticing the font size on the x is different than the y. And so we can modify that as well. So we'll go ahead and let's go and put a line break in here, right like that. We'll source it and see what we get. And so that looks pretty decent. It is a little bit tight width wise. So maybe I'll make the width four and four. And let's go ahead and source that and see what it looks like. So it looks a bit better. Again, the font sizes are different. I'm not totally loving that. Let's go ahead and see if we can make the font size of the strip a little bit larger to match what we have. So we can then do strip dot text, element text. And let's come back to our help. And let's see element text family face color size. Let's do size equals 12. And let's just see what we get using 12. See if that matches any better. So that's bigger. It looks a bit bigger than what we had. So let's do 10. Look, look bigger than what we had on the x axis. That looks good. Although now it does run along run across. And so we probably want to put a line break in here. So let's do that. Save that source it. Now look at the output. That looks good. Maybe it maybe it is a little bit larger. Again, there's all this like poking and iterating and seeing how things change. I think that looks pretty, pretty good. So something I'm thinking about, why don't I go ahead and put the legend up here in the upper right corner and give us a little bit more real estate to play with. There's no reason for it to really be out here all the way to the right. It uses up a lot of space in the plot. So what we can do is we're going to add to our theme. And we can do legend dot placement position. Sorry. And I forget the argument. But I think it takes. Yeah, I don't know. Let's give it I think it's a vector of the x y coordinates. So let me let me just experiment with 0.5 0.5. And let's see where that gets us. Yeah, so that put it right in the middle of the plot. And so it is proportional. So I'm going to go ahead and do some acts of 0.9 y of 0.8 save this and run it and see what it looks like. So that's close. So I want it to be perhaps a little bit left and a little bit up. So a little bit left would be say seven and up would be say 0.9. All right, so I think that's a good vertical I want to move it over to the right a bit. And so let's do 0.8 and source that. And that looks pretty good, I think. Again, this is my tiff version, which is zoomed in. The other thing I don't totally like is that the lines of my legends seem kind of far apart. Let's try legends spacing y and see if that gets I think that's I think that's the spacing between different legends. But I'm not sure. So let's do unit one cm and see what we get with that source that. I'm not sure if anything changed. So let me sometimes if I'm not sure if things are taking, I try to use like a excessive value like 10 centimeters. See if that changes anything. Yeah, so that is vertically spacing it between different legends. That's not what I want. Another thought would be key size, legend key size. And that takes unit again. So let's do let's let's try with that and see what we get there. So that's huge. Let's do one centimeter and see what we get. So yeah, I think this is what we want. So let's do 0.25. Source that I think that will get it a little bit closer. Yeah, so now they're compressed. Let's do 0.4. And that looks pretty decent. If you wanted you could put a box around the legend. Again, I like things to look clean and not cluttered. So I'm happy with this. And I think this looks looks pretty good. So I will leave that there. So I will come back. We've got this save. And again, the the figure I'm going to be working with is lump split PDF. And I should go ahead and put this compression LZW. If you saw the last episode, you know that I kind of struggled with this because it didn't seem to be doing the compression on the figures that these figures are still quite large 6.9 9.6. I don't I don't know why they're so big and why the compression doesn't seem to be working. But I still need to look into that. I will open up my R Markdown document. And again, come back to the bottom of this, where I was inserting my figures. And I will go ahead and put on a separate page. So new page, figure two. And this is going to be a lump split split. And I'll I'm just putting in some holder text. So rate of lumping and splitting by distance threshold and save that. And that looks good. And I will of course now add this as a dependency in my make file. So I will open up Adam and add this there. And where do I have my make file? And I need to also create one for figures forward slash lump split dot percent and the code plot lump split r. And that is based on the data processed lump split rate. So data processed lump split rate.tsv. That's good. And we can then do period forward slash code plot lump split dot r. So that's good. I need to make this executable need x. And then I can do this. Sorry. You know what? I forgot to what am I doing? I forgot to put the shebang line in there. So it's complaining. All right, good. So let me come back to the top here and grab that shebang line from somewhere else because for the life of me, I cannot memorize the shebang line. Alright, so I'll use this one. Put that in there, save that. And then let's go ahead and try that. That's working. And again, if we look at our figures lump split, we could again do lump open figure lump split PDF, see what that looks like. That looks good. And we can then again, what we're going to do is modify our make routine here. Let me put it down a line so it's so the figures are in order. When I am when I'm confident of the figures and the numbering, then I will make versions in submission manuscript called figure one, figure two, so that I can keep things straight. Okay, so now I will go ahead and make this. And we will see if we get the figure embedded in our markdown document. Very good manuscript PDF. And if we come to the end, we see sure enough, we have the figure embedded in our markdown document following the figure we made in the last episode. Alright, excellent. So I really like how that came out. Again, the nice thing here is that we've got two panels on a figure and the x axis is shared between them. And it lines up and it looks nice. It looks cohesive, which is better than back in the battle days where we might import figures into PowerPoint and kind of fiddle with things to get things nudged back and forth. And, you know, I've already told you I'm going to change these colors. So I'm going to change the colors in a script, but I don't have to worry about again doing all that nudging and bumping things around. So I think this looks good. I'm happy with this. Feel free to tell me down in the comments below what you think of this figure, and what, you know, might make it look a little bit nicer. In the next episode again, we will talk a little bit more about colors and how we can pick colors that are appropriate, that are discriminating, and that are friendly to our friends who can't perceive the difference between red and green. So there's probably a good number of you that can't differentiate between the color of my v4 and v45 region here on these plots. And also that green is just a bit jarring. Anyway, please be sure to like the video. Tell me what you think down below. Tell me if you've got any questions. I'd love to get feedback. I've loved the feedback and interaction that we've had so far. I could definitely use a lot more of that. So please use the comment below. I do read everything and I try to respond to everything. Anyway, keep practicing, play around with this stuff. Don't be afraid to go into that theme function and look at the different arguments that are there in the help menu and see how you can modify basically anything in your plot. It's really powerful and really useful for making elegant and nice figures for your manuscripts. All right, so we'll see you next time for another episode of Code Club.