 Hey folks, if you've been following along in the last several episodes of Code Club, you know that I've been trying to explore different alternatives to a bland ordination plot. I didn't mean to say bland, but bland, right? In my sense, an ordination is basically where we take all the data we throw into a figure and we ask the audience to kind of interpret the thing, the way we want them to see it, right? And that makes it really hard on our audience to make, you know, pretty fine-scale comparisons between different points, especially different sets of combinations of points, right? And so what I'm trying to get us to is really focusing in on the data that we want the audience to see in the way we want them to see it, right? So a couple episodes ago, we started looking at different interval sizes between different time points and then what is the average distance for that interval, right? So like a 1, 2, 3, 4, 5, 10-day interval, you know, what is the distance, the average distance between points that are, say, five days apart from each other relative to one day apart. In the last episode, we then looked at a time course of a one-day interval, right? So we plotted the distance over, you know, a time series where the distance was kind of back to the previous day. What I'd like to do in today's episode is plot the distance back to some reference point. And so in my study, I'm looking at these mice as they were weaned from their mother going out about six months, five months, I guess. And what I want to do is compare them to the day they were weaned. Now for your application, you're probably looking at something different, right? So perhaps you treat mice with antibiotics and you want to compare the mice back to the, right before they were given the antibiotic. Or perhaps you've got bioreactors and you feed in some new substrate and you want to calculate everything to the time before the substrate was added, right? Or perhaps you're not even dealing with time, you're dealing with space, right? And you've got some gradient of some pollutant and you want to compare the community to something that's that's far away, right? So what's the distance between two communities that are say, you know, 10 meters apart versus five meters versus three versus one meter apart, where that distance is kind of, you know, some gradient of a chemical, right? So there's a lot of different applications for what we're doing today. But again, what I'm going to do is compare everything to day zero. Here we are in our studio, I've got a new r script times zero plot r. Let's go out 15 lines. We've seen this before. If you want to get a copy of this down below in the description, there's a link to a blog post where you can get this. Also up here, I've got a link for a video where you can use all this information to get caught up, get the code, get the data. And so you can hit the ground running coding along with me. So this script reads in an OTU count table. It's called a shared file generated by the mother software package. What we're doing is we're filtering out the days that aren't between these two intervals, day zero through nine, and 141 to 150. And then we generate a Bray Curtis distance matrix, it's rarefied to 1828 sequences per sample. I'll go ahead and run all this so we can get going generating our figure. Very good. So now we have mice dist, which I will use to start a whole new pipeline. This is a distance matrix structure in R. And so I need to get it into a tidy format. And so a couple steps that we've seen, but I'll go back through, we need to pipe this into as dot matrix to get it out of the distance format into a matrix format. And from there, as tibble, where we can then do row names equals samples. We now have our distance matrix with the first column being the samples for the rows and the column names are the same thing. And then we have this square distance matrix, you can see on the diagonal are zeros because that's the self comparison between the distance between a sample and itself. So I'm going to go ahead and pivot longer to get it to be three columns and we'll do minus samples. And so now we've got samples name and value again value is the distance. And I want to look at one triangle of this distance matrix. And so I'll do filter samples less than name. Again, this now gives us one triangle of that distance matrix and also removes that self comparison. As we've seen previously, I also need to get the animal identifier and the day from the samples column and the name column to do that, we'll go ahead and do a mutate. I'll do animal a equals str replace on samples. And then we'll do D period star. So that will match the D and anything that follows the day. And we'll replace it with nothing. And then I'll do the same thing, but for animal B, and that will be on the column names, which is the name column. And then we want the day, the day post weaning. And so we'll do day a as str replace. And we'll do samples, comma, period star D. So that match everything up to and including the D and match it with nothing. And then we'll pipe that into as dot numeric. And so what we can now see is that we have animal a animal B and day a as a double again, note that this pipe is inside the line that we're generating day a. So I'm going to go ahead and copy this so we can generate day B. So again, that will be a B and this will be name. And great, we now have our columns. So again, I'm only interested in comparing the distances for the individual animal relative back to day zero. And so I'm not interested in comparing f three to f four. And so I need to go ahead and add a filter. And so we'll do filter animal, a equals animal B. And then I'm pretty sure but I want to double check that all my day zeros are in day a. So I'll do filter day B equals equals zero. Yeah, and there's nothing there in day B for day zero. But what I do want is day a equals zero. So I'm going to add that here and do day a equals zero. Because again, I want to come get all the distances relative back to when the zero time point. All right, so let's go ahead and feed this into ggplot as x, I'm going to put day B, because day a is zero for everyone, right? And then y is a value. And we'll go ahead and group by animal, a again, animal and animal B are the same value in our data frame. And then we'll add geom line. We get these two clusters of points for the early and the late time points. And so what I want to do is go ahead and add in a new variable for the period. And so I'll do that back up here in this mutate block, where I'll do period equals if else a day B less than 10, I'll call that early. Otherwise, I'll call it late. And then I'm going to add a facet wrap. So I'll do till the period. And I like having the day on the x axis and then having the distances on the y axis. And so that basically they have the same x axis and the same y axis for the two panels. So I'll do n row equals two. Also, if I if I leave it like this, then I'll have two panels where the x axis goes from zero to 150 zero and 150. So I want each panel to have its own x axis. So I can do scales equals free x. And so now we can see sure enough, we have those two panels for the two periods. Again, these are distances back to day zero. So this should be one day one, two and a half, right? And so let's go ahead and clean up this x axis, because that's a bit annoying. So we'll do scale x continuous. And we'll do breaks equals one to 150. Again, we're going to do some cleaning up of all this formatting. I'd like to also go ahead and add in color to indicate the sex of the animal. So I need to add a sex variable. And again, the f in this animal name is female and m is for male. So I'll do sex, and we'll do if else. And we'll do str detect on animal a. And then the pattern I want to match is f, then that's going to be female. And otherwise it's going to be male. And then we're going to add to our gg plot in the AES, we'll do color equals sex, and I'm getting an error problem while computing sex. If I'll string detect animal a f blah, blah, blah, unused argument male. And I'm noticing so that that's not a very helpful error message. But I do notice that I don't have a closing parentheses after this f right. So I want to string detect f in animal a. And so now I'm worried I've got too many parentheses. Yeah, so it's telling me I've got an unexpected token here, right? So I need to go ahead and remove that. And then now everything should be good. So I don't see anything really going on by sex so much. Sure, we've got this female mouse that has pretty high distance relative to to her day zero. But otherwise, these all seem pretty steady in terms of their distance back to day zero. Whereas if we look at the early points, I sure want to see with my eyes, I mean, maybe I'm biased, I don't know, that we have lower distances on day one relative to day zero, then we have further going out. So if I fit a line through this, I would expect it to be a bit of like an asymptotic curve, right? So let's go ahead and add that line. So I will add geome smooth. And so I will then do group equals period. And I need to put that in an AES function, right? Because I'm mapping the period onto the group. And so that needs to be an AES function, right? And then we'll do se equals false. So we don't have that cloud around the points. And I'll go ahead and make the color equals black. And I know that I want it to be thick. So I'll do size equals two. And so sure enough, we can see with the early that the distances start low, and they go up and they kind of flatten. Which if you remember the previous episode, you'll recall that right about this point where things have started to flatten the distances between days is starting to fall down, right? And also we can see for the later time points that for the most part, they're all about the same distance back to day zero. So that's pretty cool, right? So now what I want to do is move on and see if we can't make this figure look a little bit more presentable for something we might want to put in a publication. So as always, the first thing I'll do is theme classic to clean up the appearance that looks good. So I think I'm also going to go ahead and remove those labels for the early and late panels. I think it's obvious that they're early or late if they're for, you know, the days that it says on the bottom. So I'll go ahead and do theme. And then we'll do strip dot text equals element blank. Nice. So that cleaned that up. Let's go ahead and clean up the colors a little bit to match what we had in the previous episode. I can come in here and do scale color manual. And we'll do can do name equals null. And then breaks female and male. And then value of I think we had purple and lime green. And then we also have labels of female and male. And it's upset with me because it says our error here argument values is missing with no defaults, because I put value rather than values. Very good. So now we've cleaned up the color and this color scheme matches what we've had previously. So now we need to go ahead and clean up our axis label. So we'll do that with labs. And x will say days following weaning. And then y will be Bray Curtis distance to day of weaning. Okay, plus sign on that. And so that looks pretty attractive. I'm going to go ahead and save it. And I'll save it as time zero plot dot PNG. And let's do width equals five, height equals three. I like to save it with the dimensions I want it to be outputted as before I start mucking around with things like font sizes, right? So my x axis fonts are all overlapping with each other. But this is kind of a square dimension. But if I output it in this rectangular format five by three, then maybe there won't be so much overlap. So I like to get things out into the file format that I'm interested in, before I do a lot of fine tuning of like positions and sizes and whatnot. Good. So overall that looks pretty attractive. My y axis label is a bit long. So maybe I'd want to put a line break in there perhaps. After the two, certainly these x axis labels are too big, they're overwriting on top of each other. Also, my legend labels seem a bit big. So why don't I go ahead and fix the y axis label and then we'll come back and reassess. So like I said, I'll go ahead and put a line break in here with the backslash n. Now let's look at the legend and see if we can't make that female and male a little bit smaller. And so to do that, we'll come into theme. And we will do legend dot text, element text size equals seven. So that made it a little bit smaller. And then you can perhaps notice that it then made the plotting window itself a bit wider. So let's go ahead and make the x axis text a little bit smaller now. So we'll do access dot text dot x as element text size equals seven. So the numbers are a bit smaller, but they still kind of run into each other. One thing I noticed is that the spacing between the legend and the right side of the right panel is pretty big. Let's see if we can't go ahead and shrink that down. And to do that, we could do legend box. And there's two variables that I want to check out. One is box margin and one is box spacing. So I'm not totally sure which one I want. So let's start with margin. And that is going to take the margin function. So let's try some extreme values. So I'm going to try all zeros. That didn't really seem to change anything. Let's go ahead and then and do spacing. And that takes the unit functions will say unit. Let's try zero and then in units of inches. So that got rid of some of that dead white space between the legend and that right hand panel. Still the numbers on the x axis for the late period still kind of run into each other. Why don't we go ahead and see if we can't make the figure a little bit wider, maybe go up to six inches and see if that doesn't help the situation. And so that certainly looks a lot better more spacing on the x axis for those time points. I'm pretty happy with the way this looks. One thing that I would like to do that I mentioned in the last episode though, is that we have a break in the x axis and these breaks can be a bit controversial. I think they're the biggest problem when you start connecting the lines across the break. Because then people don't perhaps notice that there's a break in the x axis like we have here. So I think we're okay because we don't connect across those lines. One thing that we might think about doing though is drawing a box around the two panels. So let's go ahead and do that. And we can do that with panel dot border. And we'll then do element racked and color equals black. And that gives us a white rectangle over our data. So we need to change the fill, I believe to be na. So fill equals na makes it transparent. That is much better. And so now we get that border. It is a bit of a different line thickness than the axes. So why don't we go ahead and see if we can't boost that a little bit. So we can make those lines a little bit thicker by doing size equals one. So that gives us a thicker border around the two plotting windows. You know, I'm not totally sold that I like having that border, but I'll leave that in here. What I would encourage you to do is maybe go back to the last episode where we generated a similar plot where we're looking at the day to day difference rather than the day back to day zero distance. But I didn't put the boxes around that. So see if you can maybe go back to that last episode and put your own boxes around those plotting windows, those panels, and see what you think if it helps to tell the audience that these are two distinct groups of data. You know, you do what works well for you. And ultimately, we really want to avoid misleading our audience. So maybe take two versions and show them to different people and let them tell you what they think. Hey, you let me know what you think. Do you prefer it with or without the boxes? Leave a comment down below in the notes. Well, I really hope you've enjoyed this episode. And again, thinking of different ways that we can represent distances that we might normally represent in an ordination. Again, I think this has a lot of different applications beyond my own data set here. Again, I think you could do distance to some time point in a number of experiments. You could also do distance geographic distance, right, like a spatial distance back to some other point. And especially that would be valuable if there's some type of gradient, or if there's some kind of source pollutant at the reference point, and you kind of want to see how does the community change as you move away from that pollutant. Again, there's all sorts of cool different things. Let me know down below in the comments. If you've done a plot like this, and you know, maybe share a link to the paper where you did it, so we could all see what you did. So we can all get more ideas of how to think differently about representing distance data that's better than perhaps doing it as an ordination. Anyway, let me know what you think. I've got one more idea up my sleeve for how to work with distance data that doesn't involve an ordination so that you don't miss that episode. Please, please, please subscribe to the channel, click that bell icon and give me a thumbs up. Also, if you have other ideas of ways to play with distance data that doesn't involve an ordination, let me know, and maybe I can add some more episodes. That'd be fun. All right, keep practicing. Try this with your own data, tell a friend, and we'll see you next time for another episode of Code Club.