 Hey folks, as someone that studies the human microbiome, I know I am supposed to absolutely just love ordinations. In my experience, when you get any number of samples, say over like five, your ordination starts to just look like a big blob of points. And oftentimes, I'm left looking at that ordination wondering, what is the story here? What am I supposed to be taking away from this figure? Well, in the last episode, in today's episode, and a couple more episodes going forward, what I'd like to do is take some time and show you some alternatives to an ordination. In the last episode, I took some time series data and calculated the average distance between different time points of a time series data set that I had, describing the change in the community structure of a dozen or so different mice. I looked at the average distance for a one-day interval, a two-day interval, three, four, five, up to nine, I think. And I looked early in the mouse's life, and then also later in the animal's life. And what we found was that later in the animal's life, that the difference, say, between a one-day interval and a nine-day interval was pretty comparable, whereas early in the life, there was quite a bit of difference. Whereas if you went out nine days, it'd be quite a bit different than what you'd find at one-day interval. Okay? So what I want to do is dig into that one-day data a little bit more. So you might have a data set like this, where we have time series data, and you want to see day-to-day how much is the community changing. Is the day-to-day variation pretty consistent? Or is it increasing as time goes along? Or is it decreasing? Right? So these are interesting questions. And again, I'll put it back to you. Could you get the same type of information out of an ordination? And I would say no, at least not easily. Because again, what we're able to do is link the time points for a given animal and then to compare the trajectory for each of those animals. The other thing to keep in mind is that when we put data into an ordination, we're not explaining all of the variation in the data. The ordination process removes some of the signal, right? We're looking at the components, if you will, or the axes that explain most of the amount of variation in the data. But usually, it's not explaining all of it. There's a filter imposed by the ordination. If we plot the distances themselves, like we will, in the plot that we generated last time, as well as today, then we'll actually see the data as it is without that kind of distortion filter. So in this episode, we're going to generate this figure on the x-axis. We're going to put the days. And on the y-axis, we're going to put the break-artist distance comparing today's time point to yesterday's time point as it goes through the time course. And we'll also break up the early and the late periods for this particular study. Again, I think this would be a very useful technique if you've got any kind of time series data in your interest in questions related to community structure. So I've got an R script here called onedaylagplot.r. You can get a copy of this, what we're starting with these 22 lines. If you go down below in the description, there's a link to a blog post. Also up here, I'll put a link to a video that'll show you how to use that information to get caught up with the code, as well as the data that we're using. Again, what this is doing is reading in the library tidyverse package and vegan package. We're going to use tidyverse for great things like dplyr for ggplot2, vegan for calculating our ecological distances. And again, all this code will read in a shared file, an OTU count table that was generated in the mother software package, and gets it into pretty good shape that we then use to calculate a Bray Curtis distance matrix that has rarefied to 1828 sequences per sample. Again, we now have mystist, and this is a distance matrix version of a matrix. So we need to get it into a tidy format that we can work with going into ggplot2. And so we've seen this in previous episodes, so I'll maybe go through it a little bit quicker. But to get it out of a distance matrix and into a matrix, we'll do as dot matrix. And then we can pipe this into a table with as table row names equals samples. So this then gives us a table version of the distance matrix with a column for the samples. Also, we now have column names that are also the sample names. And so we want to pivot longer to get two row two columns of samples and names for the rows and the columns. And so again, we can do pivot longer minus samples, so we're going to pivot longer everything but the samples column, I'll then do a filter for samples less than name. And what that'll do is that'll get rid of the self comparison like F3D0, F3D0. It'll also get rid of one of the triangles, the top triangle, the bottom triangle. And so now we go from 51,000 rows to 25,000 rows, basically cut the size in half. So now we have the data in a tidy format without duplicate distances. What I'd like to do though is again, I want to draw a line for each animal separately. And I also need to get the day for that observation. So I need to generate columns for the animal, as well as for the day for both the samples and the name column. So to do that, I'll go ahead and do a mutate on animal A, and that'll be a str-replace on samples. And we'll do d period star as the pattern we want to match, and we'll replace it with nothing. So again, we get F3. We'll do the same thing with animal B. But that will be done on the name column. And now we want to get the day for the samples column and the name column. So we'll do day A str-replace on samples. And here we'll match everything up to the D. So we'll do period star D, replacing that with nothing. And so now we get the zero, but we see it's a character. So we need to go ahead and pipe str-replace to as double. And so now we see that we have day A as a double. One thing you'll notice is that I have a pipeline, and I pipe the output of str-replace into as double. I could have taken as double and then wrapped the parentheses around the str-replace. I think this is a little bit more readable. And it's a different way of doing it than maybe we've done in previous episodes. Again, I'll go ahead and grab that and we'll repeat it for day B. And again, instead of samples, we'll do name. So now we've got day A and day B both as doubles. We also have the two animal identifiers. So now, because again, I want to limit things to comparing within an animal, I'm going to do a filter on animal A equals equals animal B. And again, that brings us down to 2,000 rows. But again, I'm not interested in the difference or the distance between day one and day 141, right? Or day zero and day 141. I want that one day differential. So I'm going to come back up to my mutate and do a diff. And again, we'll do absolute value on day A minus day B. Again, I'm doing the absolute value because I want day three minus day four to be a positive number, not a negative number. And then we'll go ahead and say animal A equals equals animal B, and diff equals equals one. So we now see that we've got a bunch of ones in that diff column. I'm not totally convinced that day B is always greater than day A. So I think what I'm going to do is go ahead and make a day column. So I'll say day equals if else, day B greater than day A, then I want to return day B. So the day I want to represent is the larger of the two days. Otherwise, if day A is bigger than day B, then I want to represent day A. So now we have all the information we need to go ahead and build this plot, I think. So we'll do ggplot aes x equals day, y equals value. Again, that's the break artist distance. And then we'll do group equals animal A. Again, animal A and animal B will be the same value. And then let's go ahead and do geom line. And there you go. We definitely have samples from days one through to nine, and then 142 to 150. And then those lines are connecting them. Again, this is all just a mass because we've got 150 days on the x axis. And so what I think I'd rather do is break this up and basically make two panels. This is kind of like a broken x axis. In general, we don't really like broken x axes. Let's give it a shot and see what it looks like. I think this is a bit, this isn't helpful, right? Like you really can't see what's going on within the two time periods. So to break it up, we need to create another variable that I will call period. And I'll say if else day less than 10, then I'll call that early. Otherwise, I'll call it late. And then I will add to this a facet wrap. And we'll then say tilde period. Now I have my early and my late facets. I was successful in getting early to be early and late to be late. If you've got some variable that you're doing this with and you get the opposite order of what you want, know that you could redefine period to be a factor and you could set the levels of factor and set those levels in the order that you want them. But what we want is we want to really focus in on the periods of time points that we have within early and late. And so we could then do is scales equals free x. So then this gives each panel its own x axis. And so we see that the left hand side for early goes from day one to day nine. And then on the right, 142 to 150. Again, we don't have day zero or day 141, because we're looking at a one day interval, right? So one thing I want to try is what if we stack these on top of each other? And to do that, we could then do n row equals two. So I think I actually prefer them side by side, because then we can have a common y axis. And it would be easier to see that these early time points, these lines are far more erratic and bouncy than the late ones. So I'm going to go back and manually fix this to be n row equals one. And we'll be in good shape. And so again, now we can see that we have that common y axis. And we have that x axis representing days, and that we've got this kind of artificial break in the data to indicate the jump between the early and the late time points. So let's give this some color. And I'd like to color it by the animal sex, because maybe, you know, the male mice have more erratic day to day changes than the females. Who knows, right? So again, we'll come back up into our mute date statement. And I'll do sex. And then I'll do if else. And I'll add into this an str detect. So str detect takes a string. So we'll do like animal a, and we'll then detect if there's an f in it, right? And so str detect returns a logical value. So if that's true, then we'll say female. Otherwise, we'll say male. And then down here in ggplot, we'll then say color equals sex. We can see there's no clear trends probably wouldn't have expected there to be between the male and the female mice. But we've got a little bit of color here. So that's always good, right? My eyes want to see that the early distances are falling that the day to day distances are falling. Whereas with the late that it's flat, something that we could add to this would be geome smooth and geome smooth will draw a spline basically through all of the data. Now I don't want to spline through all 12 mice. I want to spline through all the early and all the late. And so what I'll do here is I'll redefine the AES the group to instead be by animal a to you by period. And so now we get a fitted line through all of the time courses for early and late separately. We get this blue line and then the shaded is the SE. I'm not so thrilled about having that SE line the standard error cloud around the line. So I'll do SE equals false. And I'll go ahead and make that color black. And let's go ahead and make the size a little bit thicker with like a two. And so now we see that black line going through the data. And sure enough it appears that for like the first six days that the early mice that communities the day to day variation is pretty high. But then after about day five or six, it falls off. Whereas for the late time points, it's pretty, pretty smooth, pretty consistent. One thing I'm not jazzed out is this x axis. Again, having this, you know, two and a half, five, seven and a half, I'd rather have, you know, one, two, three, four, five, six all the way out to 150. So let's go ahead and add a scale x continuous. And we'll do breaks equals a one to 150. And so now we can see that we have these day breaks. And so that looks good. It's much easier to see then that that change is actually starts at day five and after day five, things start to fall off. And we don't have those incremental days on the x axis, things are a bit jammed in here and we'll resolve that before we finish today's episode. So I'd like to spend a little bit of time making this plot look a bit more attractive. So of course, what I always like to start with is theme classic, to go ahead and give it a nice clean background. So that does look a bit cleaner without all the gray background and the grid lines. I'm not totally buying that I need to have the labels for these two panels early and late. So I'd like to go ahead and remove those. And so what I can do is I can do theme and it's been a while since we use the theme function, huh? And then we can do strip dot text equals element blank. And that then gets rid of those labels on the two facets. And so again, I think this is cleaning things up and making it look a bit more attractive. One other thing we can do with the legend is we can put it inside of the plot so that we can then expand and use more of that real estate and kind of the right quarter of the figure. So I'm going to do a little bit of work on that legend. So let's start by changing the colors. So we'll do scale color manual, and I'll do name equals null. And again, that will get rid of sex as the name of the legend. And then we'll do breaks of female and male. And then we'll do labels of female and male. And then the colors, I'll do purple and lime green. And instead of colors, I meant to put values. And so those colors look okay. They're a little bit garish, don't you think? But that looks good. We can go ahead then move that legend inside of the plotting window by coming back down to the theme function and do legend position equals and then we give it a vector of two values. The first value is the x position. And the second is the y position. And it's a fraction in terms of kind of like relative placement in the plot. So I could do like 0.8. Let's do 0.8, 0.8. I'll try that and we might move things around. Actually, that looks pretty good. I hit it right. So we've got our legend inside of the plotting window. Maybe I want to move it up a little bit. So I move that 0.8 to 0.9. Yeah, and so we can now see that we have the legend more in the upper right corner of our plotting window. And we don't have all that dead space off in the right side. I like to move my legend inside the plot. If it's not going to be confusing to my audience, what the legend represents, right? Like I wouldn't want my legend on top of the data, or I wouldn't want my audience to think that the data were somehow in the legend. So I think that looks pretty attractive. The next thing I want to take on are the labels on the x and y axis. So I'm going to come back up here and I'll use the labs function with x, and I'll do days after weaning. And then y will be break artist distance to previous day and a plus and Curtis not Curtis. And so now I've got my x axis label and my y axis label. So the last thing I want to do is save the image as like a PNG or TIF or whatever, so that I could put it in a manuscript or perhaps in a slide deck for a presentation I might give. So do Gigi save. And I'll then call this one day lag plot dot PNG. If I want a TIF, then I would use TIF instead of PNG. And then I'll do width equals five, height equals three. And there we go. That looks pretty good. I'm pretty happy with the way this looks. One thing I might want to change is the size of the labels on my x axis. They look a little bit big and especially when we're out in like the 140s or so, they tend to get a little bit close to each other. So I can clean that up a little bit by coming back in here and doing axis dot text dot x equals element text. And then size equals let's do seven, I think eight is the default. And so that makes it just a smidge smaller and a little bit clear to differentiate those sizes. I really hope that you found this interesting as an alternative to thinking about the change, the day to day change in our community structure in a way that's different than looking at an ordination, right? So if your question is, you know, how does the community change day over day? Well, this shows it to you, right? If you show your audience an ordination, a big blob of points, it's gonna be really hard for them to differentiate, you know, the 100 or so points that we have in this, in this plot, right? This plot makes it much easier to see what's going on. And also, as I mentioned, we don't have the distortion imposed by our ordination technique. We see the actual distances as they are. And again, then by fitting our line through it, we can see that yeah, you know, after about five or six days, the community is slowly starting to stabilize. And then certainly for those later time points, it is also pretty steady. One thing to be mindful of, of course, is this break in the x-axis. You know, people are of mixed mind of what they think about having a break in that. Generally, what people say is design your visuals that you don't have to have a break. You know, I'll leave it to you to think about what you think of this. I think this looks pretty good. One thing you might do is consider putting a box around both of the two panels. I'm not going to do that today. Maybe we'll save that for the next episode. As I mentioned at the beginning of the episode, in the next episode and the one following it, I'm going to look at additional alternatives to ordination so that you don't miss those episodes. Please be sure that you subscribe to the channel. You've clicked the bell icon, so you're receiving notifications and you click the thumbs up so everyone else can be aware of what we're doing over here too. And you know, it wouldn't be, it wouldn't hurt too much if you wouldn't mind letting your friends know about what we've been talking about. All right, keep practicing. Try this with your own data if you have time series data. Be thinking of alternatives and we'll talk to you next time for another episode of Code Club.