 When I'm doing a data analysis, one of the things I work hard to avoid is hard coding information into my visuals or into my text. What do I mean by hard coding information? Well, imagine that I had some type of statement in my visual or in my text that said something like the study had 350 participants. I wouldn't want to write that explicit 350 into the text or into the visual. Instead, I would want to use our code to generate that number. That way, if I have to add or remove samples, that will automatically update the number of participants in the study. You ask, how can we do this in R? Well, stay tuned and I'll show you how in this episode. Hey folks, I'm Pat Schloss and this is Code Club. In recent episodes, we've been taking another look at a supplemental figure that my lab published a number of years ago in a study looking at the variation in the gut microbiota of people with and without C. difficile infections. Now, you don't have to know anything about C. difficile infections or microbiology, but ordinations are a very common technique used in ecology. They're useful for taking highly dimensional data. We're talking dozens or hundreds of dimensions and reducing them to two or possibly three dimensions so that we can visualize them in a two or three D medium. We've talked about these types of approaches in recent episodes and the data that we've used comes from an approach used called non-metric dimensional scaling or NMDS. Metric dimensional scaling has a set of techniques called principal components analysis and principal coordinates analysis. In today's episode, I'm going to try to generate a plot using principal coordinates data for the same distance matrix that was inputted into the NMDS using PCOA. One of the things people like to do with PCOA is to use the percent of variation explained by each axis in those axis labels. Don't worry, we're not going to go all wonky talking about eigenvalues and eigenvectors reduction of dimensionality and things like that. I prefer NMDS because I think it does a better job of representing the data. Other people like PCOA or PCA perhaps because the x-axis explains the most amount of variation in the data and the second or y-axis represents the second most amount of variation in the data. The axes in NMDS really mean absolutely nothing. Anyway, what we want to do is we want to insert that percent explained into the axis label. Another thing that we'd like to do is that in the title, as I have it now, I've got my different treatment groups or patient groups colored to match the colors in the plot. The problem is that because I'm using HTML and CSS in the title, I'm using hexadecimal colors there and down in the plot, I'm using r's named colors. If I wanted to change the colors across all of my patient groups, it would get kind of messy. What I'd like to do is create variables for each of the three different patient groups so that I can define them in one place and have that populated elsewhere. So we'll see how we can do that simply using a package that I've actually never used before and I've just learned about called glue. Glue comes as part of the tidyverse and we'll learn a few elementary aspects of glue. And I think it's going to be pretty slick and a lot easier than other approaches that I've used for combining text and data. Before we dig too far into learning the glue package, I want to show you how I currently combine text and data. So there's a there's a function called paste. And we could say something like my name is, and I could then put a space and then I could say patch loss, right? And so that then paste it together, I could say name is patch loss. And then I could do something like paste my name is, and then I could put comma name. And then that would paste together my name is and patch loss. You see that it inserts a space. So paste concatenates the two strings by default with a space. So alternatively, I could use paste zero, right, to say my name is patch loss and that paste that together without any delimiter between the text. This works well, but it gets a little bit cumbersome because as you can see, this gets to be quite long. Alternatively, we could say library glue. And again, glue is part of the tidy verse. I could say glue. My name is, and then in curly braces, I could put name close curly brace, and it will then insert my name is patch loss. Again, that's pretty slick because we could say explained is 10.5. And then I could say glue pcoa axis one. And then in parentheses, I could have the curly brace explained. And then percent and then the closing parentheses, and then that will output pcoa axis one 10.5%. And that then could become our axis label. So let's see how we can do this in our within our Schubert pcoa.r script. So I'll start by loading library glue, along with all my other packages here. And I'll load these files that we've seen in the past. Actually, I want to change this nmds 2d to pcoa. And I'm going to change this to pcoa. And I'll probably just rename everything that's nmds to pcoa. And you know, I'll match the case because I think down below, I've got some capitalization issues going on with my axis label. So I'll run that. And what I now need to do is load my file that contains the percent variation explained on each of the axes. So I'll do read TSV. And this file is actually in raw data. As Schubert, break Curtis, where are you? pcoa dot loadings. So if I run that, I then see my data frame has access and loading. And so I'll say this is explained. Right. And I can then get, I'm going to create a variable called explained one and explained two, because what I'd like to have down here would be pcoa principal coordinate axis one and principal coordinate axis two. And then in here, I'm going to do, you know, percent explained by that and percent explained by that. Right. And so I'll do explained one as my variable for the percent explained on the x axis. And this will be explained. And I'll pipe that to filter. And then I'll get access is one. And I will pull, I believe this column is called loading. And so then if I look at explained one, this is a number, so 7.35%. And I will then round that to one significant digit to the right of the decimal point. So I get 7.4. And then I'll do the same thing for explained two, for the second axis, the y axis, the same thing. So 3.2%. So these loadings percent explains on the x and y axis are quite low. But for human microbiota data, they are very common. And so if you see really any patterns in the data, it's generally thought to be a fairly strong signal. All right, so now we want to use glue to insert this information. Let me just show you, first of all, what this would look like with paste zero. So I could do paste zero, pco axis one. And then in here, I'll put quote. And now I'll put an explained one comma that, right. So this, well, my mouse is moving all over the place. Sorry. All right. And then I need a closed parentheses. Okay, so this is what I would do to put the percent explained in the x axis using pay zero. For the y axis, let's use glue and see what this looks like. So I'll wrap this in glue. And then in curly brace, I'll do explained two. Okay, so syntactically, I think it's a lot simpler than paste zero. Perhaps it's not that shorter. But it's it's much easier to read what's going on, I feel. So let's give this a run and see what it looks like. So good, we see that we have an ordination, it looks quite a bit different than our NMDS did. We see that we have separation on the x axis. Again, the x axis represents the most amount of variation explained in the data, followed by the y axis. And so we have our healthy individuals off to the right. They tend to be clustered more tightly than our patients or subjects with diarrhea and those with C difficile. One of the problems with PCOA is that you kind of get this this arc effect, which isn't really desirable. It's not super horrible here, but it doesn't look great. Again, this is one of the reasons I kind of prefer NMDS, because it gives you a better representation of the data in the number of dimensions that are available to you, which because we're on a screen is going to be two dimensions. Again, what's important for this episode is that we see that we can fairly easily insert the percent explained into the x and y axis labels. I really like how the glue worked. And so I think I'm going to replicate that back up here, instead of using paste, and make that axis one and explained one. And we'll double check that this looks right. So that all looks good. Very nice. So let's see another place that we can use glue. So as I mentioned, I have color hexadecimal for dark gray here of 999999. But I have gray down here. I can't use gray up here in the title because this is within the CSS code. So what I'll do instead is I want to make a variable that I will call healthy, healthy call healthy color, let's go and do that. And I will then do pound 999999. And so that's the hexadecimal. And I will also do diarrhea color. And I'll make that. Oh, where were you over here? So this was blue. And then I'll do case color, which will be ff 00000. Okay, great. And now what I can do to double check that this works is I can insert those colors down here. And so this will be healthy color. So I won't put it in quotes, because it's not a variable, it's not a string, it's a variable name. So healthy color, diarrhea color and case color. And I can copy this down to my scale fill manual. And so let's go ahead and make sure this all works. I think I forgot to load those and I'll generate the plot. And that looks pretty good. And so we'll come back and see the advantage now of naming it up here rather than down in the text. What I'd like to do next then is use these colors in my title. So how do we do that? Well, like we saw with the axis labels, I can do glue, open parentheses, and I'll make sure I have a closed parentheses here. And then instead of that pound nine, nine, I'll do in my star, I'll do healthy color. So it's gonna be healthy. And then over here, instead of this hexadecimal, I'll do the curly brace. And this will be diarrhea color. Again, I'm sorry for all the diarrhea talk. And the reason my screen is all jittery is because I have this really long name and I think we can fix that using glue as well. And so then here we're going to have in curly braces, case color. Okay, so let's go ahead and give this a run. Hopefully I got those names right. And hopefully glue does what we hope it does. And sure enough, we've gotten our colors embedded into our title and into the figure directly using glue. Now, big deal, we didn't change that many places. Well, maybe I want to lighten the color a little bit. Let's go ahead and put BB, BB, BB for healthy color. So we'll make sure that's loaded. So I updated that one line, reran it. And now I've got a slightly lighter color of gray here in my points. And, you know, I could change the colors very easily say I want my diarrhea color to be green. We then see that we've got green, right? We've changed it one place and it gets updated both here in my title, as well as down here. This is obviously not what we want. Those of you who are red green color deficient can't see the difference in these colors now. So I'll go back to what we had before, which again was 00FF. And maybe I'll make this DD, DD, DD, and that gets us a little bit of a lighter color of gray. The downside then is that in the title, it's perhaps a little bit too muted. So I'll revert back. And maybe I'll stick with that 99 that we had in the very beginning. And there we go. So again, what we've seen here is that we could insert numerical data or data that's outputted from our error code into our access labels, we could insert information into our title, you could also imagine putting p values down here in the caption. Now one last thing that I think we can do with glue again, I've never used glue before this episode. So, you know, anything's possible is that my understanding is we should be able to use two backslashes to insert a line break in the screen, but that won't be seen when it's rendered. And so again, when we have a really long title or string, it's really nice to be able to break it up like this so that things aren't scrolling off the screen so much. Now this should not insert line breaks into the title. We've got that in here in these BR tags. So we'll see if this works again, I've never done this before. So this could be bad, but we'll see we'll learn together. And so that worked. And again, the advantage there is that I don't have to worry about things scrolling so far off the screen and it's much easier to edit when everything's in one window. So that looks good. And again, we can use that double backslash because we're already using glue. So that is a win. And I think there's probably more that we could learn about glue. But I think these couple of use cases already are really empowering and certainly easier than doing the same type of thing, but using paste. And I think this improves the readability of our code. And because of that, the reproducibility of our code. So one quick thought about this plot and critiquing it from what we've added today, that we've added this percent explained to our access labels, which is helpful, but perhaps only helpful to those that know what they these mean, we don't really have a descriptor anywhere on the plot for what the percentages indicate. Perhaps what we could do would be to remove this caption at the bottom to say that those represent the percent explained. But at the same time, the people are going to have to be up here, they'll say 3.2, what's that? And then they have to come back down, and then they have to go back up. Again, if you remember my episode on kind of the Z, Z flow, that throws off the flow. So perhaps you could put something in as a subtitle, but that kind of throws off the flow of kind of seeing cool data. Maybe instead of putting it here in the axes, what we really should do is put it down in the caption that the x axis explained 7.4% of the data, variation of the data in the y axis 3.2%. And again, I'll leave that to you. Maybe you could rewrite the caption at the bottom here to include those numbers using our new glue function. See if you can give that a shot, and let me know down below in the comments how you fare. All right, folks, I hope you found that glue package and function to be useful. I know I'm going to try to use it in the future instead of paste zero. I get really confused sometimes with my quotes and my commas when I'm using paste and paste zero. I would really encourage you to try it with your own work as well. Let me know how you do. Let me know how you do with moving that information down to the caption. If you run into problems, holler, and we'll see what we can do to help you out. As always, if you're interested in this type of analysis, you want to learn more, really encourage you to check out riffimonis.org forward slash the minimal are there. I have a full tutorial series looking at different ways of visualizing working with microbiome data. It uses a different data set that we've looked at briefly before from the Baxter study. A lot of the stuff is the same. Hopefully you can think about how you will take what we've done here, use it with the Baxter data, and maybe even use it with your own data, hopefully. Well, thanks, folks, for spending your time with me today learning about this glue package and function. Like I said, I've never used it before. I look forward to using it in my future work instead of paste and paste zero. I think it's going to be a lot more elegant to use. So I don't have to worry so much about all the quote marks and the commas and making sure everything is paired correctly. I think it I think it really will be helpful and be a lot more elegant. And there probably more is more in the package that I just don't know about yet. That will help me in the future as well. And I'm sure it'll help you as well. Please let me know if it does help you. Let me know if you're successful in getting that caption modified to present the percent explained down in the caption rather than in their access labels. Please keep practicing. Tell your friends about Code Club so they can learn all this great stuff as well. And we'll see you next time for another episode.