 Hey folks, if you watched the last episode of Code Club, you'll know that I took the distance matrix we've been working with in the last few dozen episodes of Code Club and I used PCOA, Principal Coordinates Analysis, to generate an ordination visualization of those data. Ordination is a super-duper common tool that we use in microbial ecology, microbiome analysis, ecology in general, right? I even joked that, you know, can you have a paper without an ordination? Yes, yes you can and I have done it. I've got one of the few papers without an ordination actually. Anyway, PCOA is one method. It's also called metric dimensional scaling. There's an alternative approach called non-metric dimensional scaling, NMDS. I personally prefer NMDS because my approach to ordination is that it's a visualization tool to help me to show clusters in the data. Alternatively, some people really geek out about decomposing what the axes mean in Principal Coordinates Analysis. NMDS, that's not so straightforward to do. NMDS is different than PCOA in a number of ways. So remember that PCOA, you're generating those eigenvalues and eigenvectors from your data. You draw the lines through the data that represents the most amount of variation of the data. The next axis is perpendicular to that that describes the next most amount of variation that and you keep going, right? In NMDS, you tell the algorithm how many dimensions you want. So we're representing this on a printed screen or on a piece of paper so we only want two dimensions, right? You do not want three dimensions. Please do not use what I'm showing you to generate three-dimensional 2D figures. It doesn't make any sense. Anyway, in NMDS you say I want two dimensions. The algorithm then will take all your points and basically throw them into two dimensions. It then will kind of move your points around to do the best possible job of indicating the representation, the similarities, dissimilarities, the positions of those points in that 2D grid. It does involve a random process. It's doing an optimization. And so it will repeat this algorithm many times to get the best possible configuration. One of the reasons I like NMDS more than PCOA is that if I take the ordination generated by say NMDS and I use the placement of those points and use that placement to calculate a distance matrix, I then have an output distance matrix and the original input distance matrix. If I look at the r squared between those two distance matrices, it is higher when I use NMDS to do the ordination than when I do PCOA, okay? So sometimes the difference isn't all that meaningful but I routinely find that NMDS does the best job of representing the variation in the data. Again, that's partially because NMDS doesn't use linear combinations of the data. It's fairly nonlinear. And because I'm constraining it to a certain number of dimensions, whereas PCOA is fairly unconstrained in the number of dimensions that it's representing. Regardless of what you want to use to coordinate your data, know that you can do both PCOA as I've already shown you in NMDS as I'll show you today in R. I've gone ahead and copied the analysis.R script over into a new R script that I'm calling NMDS.R. If you want to get all the files and data that I'm working with, check out the link down below in the blog post. Again, what we have here is code where we're reading in a lower triangular distance matrix. It's from a study that my lab did a while back. There each sample in the study is a different fecal sample collected from one of a dozen or so mice. I then calculated the Bray Curtis dissimilarity between all pairs of samples. And as we've seen in recent episodes, we could go ahead and filter out samples that we didn't want. And so I'm looking at samples from the first 10 days post weaning as well as samples from about 141 to 150 days post weaning. And I want to look at the ordination of how those data relate to each other. So I'll go ahead and run all this. And what we find if we look at this matrix is that we get a distance matrix output. And then we also have this sample lookup data frame that indicates information about each of the samples, the sex of the animal, the animal identifier, and the number of days post weaning. Great. So we can generate a NMDS based ordination of these data by using a function from the vegan package. So we'll go ahead and do library vegan. And if you don't have vegan already installed, you'll probably need to go ahead and do that and load vegan. Vegan is a great package for doing quantitative ecological methods, things like ordination, adonis, perm, dist, a variety of different tools that people find really useful in ecology. And so if you're doing any type of microbial ecology or ecology, the odds are good that you'll come across a vegan sooner than later. To run the ordination, we're going to do meta MDS, and we will then give it dist matrix. The output then comes to the screen. And as I said earlier, the algorithm has a random component to it. And so it repeats the algorithm like 20 different times, trying to find the smallest stress value. The stress is I think of as the amount of distortion that happens when you take highly dimensional data and smush it down into two dimensions. So what I'm finding in the output is that there was no actual convergence. What the documentation suggests is setting a seed and setting a seed so that you get convergence. So let's go ahead back up into our code and do that. And so we'll do set dot seed. And let's start with my favorite 1976, 0620, which is my birth date. And so we're setting the seed of the random number generator that is used by meta MDS. And so that's always a good practice to get into anyway, because it will ensure that the downstream analysis from that set seed will be reproducible. And in fact, you should get the same output that I get. So again, for that seed, we find no convergent solutions. Let's try something simple like one. And sure enough, we find that with set seed one, we found two convergent solutions after 20 tries. And we're ready to move forward. I'm going to go ahead and call this an MDS. And so that we'll have the output from this MDS run in our MDS. And because I didn't run the seed, it didn't converge. So I'll go ahead and run those two steps again. And sure enough, we did reach the convergence. And MDS gives us this kind of generic output. And if I look at the structure of MDS, I find that it is a list containing a whole bunch of other objects. The thing that I'm most interested in is what's called the points. And so we can see here that points is a matrix of 227 rows and two columns. And it contains then the position in MDS one and MDS two of each of our different samples. I can use my knowledge of how to access values from a list to get at this. So I could do like an MDS dollar sign points to get out the matrix that contains all the points data. There's also a function that you could use called scores and MDS. And so this is a function from vegan that gives you the positions of the all the points in the two different axes. We could also do plot and MDS. And this will then give you the ordination in base our graphics, kind of like what we saw the other day, when we did this with PC away. So what I want to do is instead of using base our to visualize the data, let's see how we go ahead and make this using ggplot2. So to do that, I'm going to go ahead and take scores and MDS. And we will pipe this into as underscore table, and we'll then do row names equals samples. And so because this is a matrix, these are row names. And so when you convert it to a table, that information will be lost, because tables don't have row names. And so we can use as table and say the row names will go into a new column called samples. And so sure enough, we then get a table 227 rows, and three columns, we can then pipe this into ggplot, where our x is going to be an MDS one or y will be an MDS two. And we'll then do geome point. I should have that. And so now we get our ordination, where we seem to have a cluster of points here, and other cluster of points that's a bit more diffuse. And so it might be nice to go ahead and color our points, according to, you know, maybe their sex or the day post weaning, I can do that by modifying my ggplot line here where I could say color equals sex, but I don't have the sex information in here, right? So if I go ahead and look at things up to this point in the pipeline, I have the sample name, I don't have the metadata. So I can go ahead and do an inner join with the current data coming through the pipeline and sample look up. And then I'll do buy equals samples. And so then this will give me the sex, animal and day columns for my sample lookup. And so I can then pipe that in a ggplot, and then I can color it by the sex. And so now I see that the salmon colored points are female, and the teal colored points are male, not seeing any clear indication of a difference there. Let's go ahead and try day as our color variable. And so now what we see is that, yeah, we see that we've got the darker points are from earlier days post weaning and the lighter blue points are from later days post weaning. Something interesting that you'll notice is that there are some early points embedded in this cloud of later points. Those are the days zero, that was day zero post weaning and mice are coprophagic, meaning they each each other's feces. And so these pups were probably eating their mom's feces. And so they had a community that looked like their mom, right? And so then when you remove them from their mom, their community changed going to the right side of this ordination. And then with time at about day 15 or 20 or so, what we found is that their communities became more like these later time points. All right, enough, enough biology. So here we're using day as a continuous variable. You'll see that we have a gradient of colors here. I'd rather day be a dichotomous variable or categorical variable. So let's go ahead and do a mutate. We'll define a column called period. And I'll do if else, a day less than 10, I will call that early. Otherwise, I'll call it late. So we had day zero through nine and then 141 to 150. So those zero through nine will be early. The 141 to 150 will be late. And then I will color it by period. And so now sure enough, we get the same type of ordination, but with the default colors for early and late. Again, there's a lot more you could do to make the visual more attractive. I'll leave that for you. But I really wanted to drive home is how we can use the meta MDS function from vegan to do the MDS procedure on our data. There's a lot of other data built into this MDS object. See if you can't use your list parsing skills to pull out the stress value from MDS. I'll leave that for you as an exercise. Another exercise I'll encourage you to do is to run your PCOA code. I'm actually I'll run that with you now. And so if we run our PCOA code, can you color your PCOA figure the same way that we did with our MDS and you see the same type of relationship. I find that these late time points are a little bit more spread out than the late time points by MDS. But go back to your PCOA and see if you can't do some of the same type of things we did here to color your points in the PCOA. And then, you know, like we added percent explained to the X and Y axis for PCOA. See if you can add some type of annotation to the figure, perhaps in the title or something to indicate the stress value that we found here. We can't use percent explained on the X and Y axis here because again, the axes in MDS really don't mean anything. And so you can't talk about them being explained. I think it's probably more appropriate to say that the percent explained by the two axes is about the same in an MDS. Anyway, play around with this, see if you can bring your own data into R and into working with meta MDS to make your own ordination. If you can do that, then you'll really be showing that you really grasp the concepts and that you're understanding how to go ahead and do these different types of analysis. Keep practicing with this and we'll see you next time for another episode of Code Club.