 I absolutely love working in our making plots and sharing it with you here on YouTube. Every now and then I get a comment for the videos. For a video that I made a little while back about putting stars and bars to indicate significance on plots, I got a really good question that really got me excited so much so that I'm gonna make an episode about it today. Hey, folks, I'm Pat Schloss and this is Code Club. As I was saying a little while back, I made an episode about how to put stars and bars over strip charts or any type of plot to indicate what comparisons were significant and which comparisons weren't significant. I'll go ahead and put a link to that up at the top here. Christine Schneider, I don't know, but thank you very much for commenting, asked, hey, Pat, thanks for the grade video. I tried it and it worked very well. Wonderful. Can you maybe explain how this works with the function facet grid? I want to add one line for all the grids. I assume that she wants to put different lines to indicate different groupings and which comparisons were significant for those different groupings. And I thought that's a really cool question and why not make an episode about it? Because not only would this idea apply for stars and bars, but perhaps also for different texts that you might want to put in each of the four panels. And so as we've been going through here, I've been trying to kind of rebuild a figure from a previous paper that my lab published. And so I don't have something ready immediately for that, but I thought what we could do is make strip charts, four strip charts or three strip charts, one for the inverse Simpson, which was what we've been working on, and one for the observed number of OTUs or the richness, and maybe one for the Shannon diversity. And so that could give us a grid or a faceted set of plots. So we'll go through that. And then I can show you how we would go about placing those lines and stars or NSs for those different comparisons. Before we get going here in our studio with our code, I want to remind you that you can get this code at a link down below in the description. Also up above here, I'll have a link for instructions on how to get everything set up. Also, if you're really excited about learning R and find that maybe this is a little bit beyond where you're at, know that I do teach workshops throughout the year on how to use R. Those are live. I also have the materials that I teach from that are always up. If you go to the riffamonas.com website, and that again will be linked down below in the description. Anyway, this is more or less where we were for that episode where we were putting the stars and the bars on the plot. We're loading up our libraries, setting the seeds so the jitter of our strip chart stays the same from run to run, loading our metadata, loading the alpha diversity, joining those data frames together, just defining some color variables here, and then getting various counts that we were putting in the labels for the different diagnosis groups. We went ahead and did our Kruskal-Wallis test because the data were not normally distributed. And so the Kruskal-Wallis test allowed us to do a non-parametric test. I've gone ahead and replicated that. We did this initially for inverse Simpson by replicated for SOBS, which is the observed number of O2Us as well as for Shannon. And what we find is that the same comparisons hold. Let me go ahead and rerun this so you can see what it looked like. Again, this is the figure that we generated strip charts for the three different disease status groups for the inverse Simpson index. I had a line and a start indicate that the healthy group was significantly different from both of the diarrhea groups, regardless of if someone was C-diff negative or positive. And then also a line with an NS to indicate that the comparison between the diarrhea groups were not significantly different. What we'd like to do is to use something like facet grid or facet wrap to have different panels for perhaps the three different alpha diversity measures that I have in our data set. Let's go ahead and see if we can do that. Again, the comparisons are all the same as outputs from those pairwise tests. So I'm not gonna go into how we could rerun the tests, but know that it is in the code that I'm starting with. Again, you can get at the link down below. The first thing that I need to do is to facet my data. And to do that, I need to take my data frame and make it longer, pivot longer so that I'm gathering together the different alpha diversity metrics that I'm interested in looking at. For now, I'm gonna go ahead and get rid of the strip chart variable and open up some space here so we can work with some dplyr commands. So again, metadata alpha, if we look at that down below, is our data frame with all of our metadata as well as at the bottom here, we can see SOBS, Shannon, some lower and higher confidence intervals as well as inverse Simpson. So I'm gonna simplify things a bit and just get the columns that I want. So I'll do the disease stat and then SOBS, Shannon and inverse Simpson. Run that, we've got our more condensed data frame. Next, I'm gonna pivot longer and I'm gonna do minus disease stat because I wanna pivot longer all the columns except for disease stat and the names are going to go to, and I'll say metric. So I'll have a column of SOBS, Shannon, inverse Simpson and the values will go to, and then I'll say values, okay? Look at that and now we do have our disease statuses, our metrics and the values. And then again, each row or each combination of disease status and metric represents a different individual. We can then pipe this into ggplot and so my x is gonna be disease status, so I'll have the three disease statuses across the x-axis and then my y is going to be the values column, not the inverse Simpson. And then my fill is gonna be with disease status and what I'm gonna add then is facet wrap. And again, this should work if you're using facet wrap or facet grid and for facet wrap, I'm gonna do the tilde and then what I want are the metric column and let's go ahead and run to this point and we can see that we now have three columns or three facets for our three different groups. I would prefer to put this as three rows because I wanna have disease status across the bottom and then my three groups or three metrics across the top. So I can do n row equals three. So again, you now see I've got these three groups. So I wanna do a couple, before we get to those bars, I wanna do a few things to make this look a little bit nicer. And I'm looking at three different metrics and you can see that the y-axis are the same for all three metrics. I would like to free that up so that the inverse Simpson has its own y-axis, Shannon and Sobs have their own y-axis. To do that, I can do scale equals free underscore y and that you can think of that as freeing up the y-axis to vary by the data in that data frame or in that panel. Great, now we have our three different metrics as well as more customized y-axis labels. I'll go ahead and add my labels and my scale x discrete and then my scale y continuous, I'm gonna make go from zero to NA. The zero to NA will make sure that the bottom is zero and that it will scale up to whatever I need it to be. So let's go ahead and make the height, let's make it nine. Great, so we now have more space for our three panels. You'll notice I have inverse Simpson index on the y-axis label here. I wanna probably turn that to null. And what I'd like to do then is to put these labels on the y-axis. We can do that by coming back up here and again our y, we can make null. In my facet wrap, I can do strip.position equals left and that will take the strip from the top and hopefully put it on the left. So that did move my labels to the left. The problem is that's on the inside, I'd rather it be on the outside. I can add a theme, add to my theme down here. I can do strip.placement equals quote outside and that should put the label on the outside. To remove that border, I can do strip.background equals element wrecked and I think I can say color equals NA and voila, I now have access labels. I'd like to make these labels look a little bit nicer and so where that label is coming from is the value in my metric column. So let's come back after pivot longer and we'll do a mutate on metric and say that equals recode and we can use the recode to recode the metric column so that subs equals observed richness and we'll do Shannon equals Shannon diversity and then in the Simpson, we'll do as inverse Simpson and let's include that in our pipeline and there we go. We've got our three labels. I'd maybe like to put the observed richness first. So observed richness, inverse Simpson and then Shannon diversity. I can use the factor function and we can do metric. We're gonna mutate that factor metric and the levels will be in order that I want them. So I'll do observed richness and then we'll do, let's do inverse Simpson and we'll also do Shannon diversity of course. Great, so now we have our observed richness at top, inverse Simpson and then Shannon diversity we're in good shape and now we're ready to think about putting those lines on. Now in the previous episode, what we did, you can see down here, I've got this commented out for our current episode. Let me uncomment that now. How that we had to find this previous plot as a strip chart and then we added the lines as well as the text. So I'm gonna come back up here and save this as strip chart and then load this as the strip chart object now. And so now I can add to it my different lines. Now, if I run this line, this code, let's see what happens. What's happening is that we coded in the X and Y axis for the inverse Simpson. And so the line and the star still looks good for inverse Simpson, but it's applying the same lines and stars and NSes for the two other variables and putting it at the same spot. And so this is what we want to be able to modify is to change where that line and the star and NS go for the three different variables. How do we do that? Well, that's why we're here. That's the point of this episode. So I'm gonna start this by creating a table that I'll call lines and we'll define that as table. So this data frame is gonna have, I believe we're gonna have five columns. I'm gonna do this slightly differently instead of using geome line like I did here, I'm gonna use geome segment actually. And I think that'll make it a little bit easier. And again, it'll show us a different way to do it. We can then say metric, which is going to be the name of our metric. And so we'll have observed richness. And because I'm gonna have two lines, I'm gonna repeat this twice and I'm realizing I observed, observed, what is that word? Observed richness, observed richness. And then Shannon diversity. And then we'll do inverse Simpson. Then we're gonna need an X and an X and and we're gonna need a Y and A Y and variables. So each of these is gonna take a vector of values. And let's see how we can go about doing this the easiest way. So these geom lines are the lines that we drew for the inverse Simpson. And so two and three is the X and end position. So two, three. And then we would also have one and 2.5. And that's again for inverse Simpson, but for now I'm gonna copy that to see that we get the right answer, to see what that we got before. I'm changing things again from geom line to geom segment. And so I just wanna make sure that it works right. And if that works, then we can then modify these for the two other panels. So for the Y, we need 23 and 33 alternating because we have the first line and then the second line. And we're gonna repeat that three times. And then the Y end will be the same as the Y. I guess I could have said Y equals Y end. You know, why don't I go ahead and do that? All right, so this is our lines tibble. And if I look at lines, and now see that I've got my metric, my X, my X and my Y, my Y end. And again, those are the values that we came up with for inverse Simpson. I think that should work just fine to get us started and then we can go back and tweak things. So again, I'm gonna go ahead and comment out this geom text and again, I'm gonna replace that geom line with geom segment. And we will then say data equals lines. And we will then say AES X equals X, X end equals X end, Y equals Y, and then Y end equals Y end. I guess perhaps we didn't even need the Y end, we could just say Y end equals Y, whatever. And then we'll go ahead and say, we'll also add inherit dot AES equals false. I think this should work. So let's give us a shot and see what happens. So it's saying object disease that not found ignoring unknown aesthetic inherit dot AES. So I think the problem is that I put inherit AES inside of my aesthetic. And so I'll put that out here. I don't need that extra comma. Let's give us another shot. Great, so these lines are in the right place. I mean, they're in the same place that I had them before, right? Where I have that line, that line for inverse Simpson matches what I had before. And again, we're getting the same values for these two other panels, because that's where I told them to put it. So now we need to go about modifying where we're putting things so that we have a more appropriate looking set of lines. So let's start with the observed richness, which I will put this line, let's see. Let's put it, let's start at like 175 and 125. So again, we need to make sure that we're lining up where we want our, what rows? So I said 175 and 125, and I think I've got them flipped. So let's do 125 and 175, and that looks pretty good. I think that 175 could maybe be like, I don't know, 190. And now we need to do the Shannon diversity and get those lines back down. And so that will be the second one. And let's try for five and eight, we're getting there. I think that looks great now. We have our lines customized to the different diversity metrics. Let's repeat the process with a variable that we'll call stars, right? And it's basically gonna be the same thing. So I will copy this down. And again, we don't need the X end or Y end because we're gonna include this with GM point. We still need two values of observed richness, Shannon and inverse Simpson. We'll also need a label vector. And the label is gonna go n.s star and repeat that. Now I need to modify my X values. And what I can see from my GM text that I had here I already have the values for inverse Simpson. So that's the third row down here. So I'll do 2.5 and 1.75 and the Y position then 24.5 and 33.25. And actually, you know what? I'll copy this X position because it'll be the same for all three metrics. And then I can bump up a little bit. Maybe I'll make that 128, 193, 4.2, 4.7. And so that'll bump the label up a little bit from the actual line. And so that's our stars. All right, so now we wanna add our GM text and we will do data equals stars and AES X equals X, Y equals Y label equals label. And we'll wanna also be sure to include inherit.AES equals false. And so that looks pretty nice. We have our star and our ns values. The ns for our observed richness is a little low. So let's go ahead and bump that up. Let's do 140, let's do 135. I think that looks pretty good. One thing you noticed is that the ordering switched here. So let's go ahead and do factor. Make that metric a factor. And we will then add levels and define the order we want. And so I'll copy and paste these in in the order I'd want them in. And then inverse Simpson and then Shannon diversity. Right, and then the final thing there. And I'll put in some line break so it doesn't get all scrolly. And I'm gonna, I forgot a comma there. And I'm gonna copy this metric factor setting back down here to my stars as well. Give this a run and I should get the order I want. We see now that we have our three metrics. We have the placement of our bars and stars and ns values customized to the three different diversity metrics that we're working with. The reason this worked is that again, I'm faceting on the metric variable and I could then create data frames, lines and stars that include a column metric. And that way then when it facets the data, it takes the lines from each metric, each of the three metrics and those X and Y values or X and Y end, and then uses that to insert the line using GM segment. In this case, we could have also done GM line slightly different way, as well as the label with GM text. And so we can think of it as it's also faceting this lines table, not just the metadata alpha table, right? So we're faceting the lines table as well as the stars table. I think this was pretty cool and that Christine asked the question and I see also that John Yana also said, I'd like to know this too, was pretty awesome. So thank you both for asking this wonderful question. I hope you all got something out of it. There's more here obviously than how do you draw the line? How do you draw the star? But getting into faceting, which we've talked about in previous episodes, but not with the lines and stars obviously. So how do you facet? How do you get your data frame even into position to facet? How do you make a data frame a table from scratch? And then again, what's happening with the faceting? And it's pretty powerful I think that that facet wrap or even facet grid will allow you to facet multiple data frames as you're adding more data to the visual. If you have questions like Christine did by all means, holler, let me know. I'm very happy to pump the brakes on my progression with these episodes and to stop and kind of see like, how would we go about doing with the people what you all are interested in learning about? So thank you very much for that question. And I hope this helps you with the figure that you're making for your papers and is also helping you to learn a little bit more about R and seeing the power of using ggplot. It warms my heart to know that people are using the code to follow along and then taking that code and adapting it to their own problems and trying to make their papers and the figures for their presentations all the better. That just makes me so happy and makes me feel like this is definitely worth producing these videos. So thank you very much. Please tell your friends about these Code Club episodes. Make sure that you're subscribed so you know when the next episode drops and we'll see you next time.