 Hey folks, if you've been following along in recent episodes of Code Club, you know I'm not the biggest fan of an ordination. I feel like people overuse them and that not every question that people have needs to be answered with an ordination. Oftentimes, you know, we might want a more specific or a better tool, and so I've been kind of looking at a variety of different alternatives. As I was going through these recent episodes, someone chimed in and said, it'd be really cool to see what these distances look like as a heat map, and I said, that's a good idea. Ultimately, I think we're gonna find that the heat map isn't really the way we want to go, but I think this is a great opportunity to create a heat map and show you how we can map our distances into a heat map. I also think this is a great opportunity to bring in another metric of beta diversity. We've been working with Bray Curtis and that is a metric of structure, which takes into account who's there and in what abundance. An alternative approach to looking at beta diversity is to look at just the membership, who is there. And so to do that, we're going to calculate the jacquard index. And what I'd like to do is to create a square heat map where the distances depicted in the lower triangle are from, say, Bray Curtis and those above the triangle are from jacquard. It sounds a little complicated, but don't worry, I'll show you how to do it in today's episode. Over here in our studio, I have a script brayjacquardheatmap.r. You can get a copy of what I'm starting with here. If you go down below in the description, there's a link for a blog post for this episode. Also, I've got a video here that will help you to use that information to get this script as well as the data that I'm reading in. And so what is the data that we're reading in? Well, it's data mice shared, right? And so mice shared is a O2 table that tells you the number of times we saw each taxa or O2 or operational taxonomic unit in each sample. The study that these data came from, we sampled about a dozen mice over the first six months or so of their life after being weaned. I want to focus in on days 0 through 9 and days 141 to 150 post weaning. There's some other data in here as well. And so what this starter script does is it reads it in. It removes the days that we don't want. And then it also removes those samples that have fewer than 1828 sequences. I'll go ahead and load all this in and we can see what we've got. Again, we have early late TBL. And so we've got our group, our animal, and the day. The animal and the day information come from that group column. We also have the columns for a different taxa. These are called O2Us or operational taxonomic units. And the values, as I mentioned, are the number of times we saw say O2U1 and sample F3D141. Now, I'm going to be feeding this into the AVGDist function from vegan, which doesn't want the data as a TBL. It wants it as a data frame. Also, I don't want this animal or day column in that data frame. So what we'll do is we'll come back and remove animal and day. And I'll also put that group column, those values, I'll make the row names. So I'll start by removing this early late TBL. We'll assign it back to a variable at some point. But when I'm testing things and kind of developing code, I prefer not to write it to a variable because if I write it to a variable, then I have to write the variable name to output it to the screen. If I don't write it to a variable, then it'll come to the screen naturally. And that's kind of what I want to do for now. So again, I want to get rid of that animal and day column down here. And so I'm going to add to my select function here minus animal minus day. And that got rid of the animal and the day columns from my TBL. And so now what I'm going to do is create a data frame. And to do that, I'm going to assign the row names from the group column. TBLs don't have row names, whereas data frames do. And so to do that, we'll do column to row names. And I will give it the group column. And so that outputted the data frame, right? And so we can see that the output is a little bit different than what we saw before with the TBL. So now I'm going to save this to early, late Df for data frame. And if I do str on early, late Df, I can now see that it is a data frame. And it's got a whole bunch of columns, right? But it no longer has that group column because those now are the row names of the data frame. Great. So let's start, as I said, by creating a heat map of our break artist's distances. So to do that, we're going to use AVG DIST on early, late Df. And it will use D method equals bray. And then we'll do sample equals 1828. And that is the size of the smallest library that I had here. Let's go ahead and run that. The output of AVG DIST will be a distance matrix. And I'd rather be a TBL so that I can use that TBL with our tools from, say, dplyr and ggplot. So to go from a distance matrix to a TBL, I need to first turn that distance matrix into a matrix and the matrix into a TBL. And so we can do that in two steps. We'll do as dot matrix. And then we'll pipe that to as underscore TBL. And then row names. I'm going to say A. I'm going to call it A because I'm going to then take the column names and make those another column for B. And so if I make this a little bit bigger, we'll be able to see things more easily. And so now we can see that we have a TBL where we have that group name or the first column as A. And as I said, I now want to take these column names and make them another column, which would be B, and to go from this square to a long form of the distance matrix. And again, we can do that with a pivot longer. So we'll do pivot longer. Everything but the A column. And our names will go to, and that'll be B. And our values to will be distances. So now we have our TBL, our long formatted version of that square distance matrix. Again, A is the column of row names and B is the column of column names and the distances or the distances from our square matrix. I'm going to leave all of them in there because I'm going to start by trying to make a square heat map of these Bray-Curtis distances. I will then call this Bray as a variable that will then plot. And looking at Bray, we can again see the output that we had before, which is great. So I can take Bray and I can pipe that to ggplot, AES. And on the x-axis, I'm going to go ahead and put those row names. So that would be A. And on the y-axis, I'm going to put those column names, which would be B. And then my fill will be distances. And then I'll do geom tile. And so geom tile is the geom that will get us that heat map appearance. So we get our heat map of the, I think it was 226 total samples against the 226 total samples. And it's a bit much. What I think I'd rather do is focus in on an individual animal. So to do that, I will come back up here to my code where I made that early late DF. And I'm going to select or filter for a specific animal. And so I'll do and animal equals equals, let's do F3. And so that will give us only those distances from animal F3. So this looks a lot better. I hope you'd agree. We've got our samples, our 19 samples and 19 samples on the x and y-axis. We don't have 20 because I think it was day four. Yeah, day four is missing for whatever reason. We just didn't generate enough sequence data for to get over that 1828 sequence threshold. I'm cool with that. So we see on the y-axis as it starts down at the lowest pot at F3D0 and then works up alphabetically from there. And so what I'd rather do is order this numerically. So we have day zero here and day 150 up at the top and then day zero out to day 150 as well. So what we're going to need to do is we're going to need to take Bray and we're going to need to pull out the day for these different samples, right? So let's go ahead into our pipeline here. So we'll do separate on A and we'll use a separator of D. So again, we have like F3D9. That'd be F3 for the animal and the day would be nine. We kind of did this before up above when we were making the early late DF, but it's okay to do it again. It's more practice, right? So we'll do into as the two columns that we want to separate this into. And so the first will be animal underscore A and the second will be day underscore A. And we want to repeat this again because we want also to do it for the B column, for those column names, right? So we'll do B and B and B there and it's complaining that object A is not found and that's because I've got this GG plot line in here still. So let me go ahead and remove that pipe so we're not funneling the data down through that. So that looks a lot better, right? So now we have animal A, day A, animal B, day B. I don't need the animal stuff. So I could go ahead and do a select to remove anything that starts with animal. Okay, so now we've got our day A and our day B. This is of type character though. I actually want it to be numerical. And so back up here in the separate, we've seen before that we can do convert equals true. And what that'll do is that'll take the text and if it can convert it to a number, it will. And so now we see that day A and day B are integers, the numerical values, right? So that's good. So now let's go ahead and feed that in here to our GG plot. And so instead of x equals A, I'm going to put day A. And then for y, I'm going to put day B. And so what we get is that we have a continuous x and y axis, right? So it's putting the samples from day 0 through 9 at 0 through 9 and 141 to 150 at 141 to 150. I would rather bring those together so that if you would kind of have like a categorical variable on the x and y axis, so that all my points fall together, even though they are days and they are a continuous variable, I'd like to visualize them as being together. So to do that, I need to convert day A and day B into factors that are ordered numerically, right? And so we'll stick with that, but I'm going to add a mutate on let's do day A and we'll use FCT reorder. So we'll reorder day A in the order of day A, except I'm going to use a character version of day A in this first spot. So factor FCT reorder has to take a character or a factor to reorder it by another variable. And so we'll do as not character on day A and we're going to order it by the quantitative value, the numerical value of day A. And so let's see if we can do this without doing day B also. And so now what we see is that across the x axis, we have all 20 or 19 spots in there without that big gap. So let's repeat this now for day B doing the same approach. So I'll go ahead and add that line with day B, day B and day B. If we run that, so now we see we have all the data together and that we go from 0 to 150 and 0 to 150 on the y axis. I'd actually rather to have the y axis flipped so that I go from 0 to 150 and then I have that diagonal going from I guess from your perspective, the top left to bottom right. And so to do that, what I can do on day B is I could do negative day B. So it would kind of flip the order. So this is the heat map for break Curtis. Before I clean it up further, I want to go ahead and build a similar heat map but for the jacquard. So the code is going to be very similar to what we've already done. And I'm going to start by copying this down. And instead of bray, I'm going to call it jacquard. And instead of the method equals bray, I'll do jacquard. And here, again, we'll feed in jacquard into this and should all just work. And so now this is the heat map for the jacquard, whereas this was the heat map for the break Curtis. So the darker colors, the darker tiles mean that the samples were more similar to each other. So that's why our diagonal is this dark blue or black color because they're identical, they have a distance of zero. So again, the break Curtis distances are more similar than the jacquard distances. And so you can see these are a little bit lighter than what we see with the break Curtis here. So what I'd like to do is I want to bring together the jacquard and the break Curtis. And as I said, I want to put the break Curtis in the bottom and the jacquard in the upper triangle. So let's again, remember what we have with bray. We have this data frame with a, b and distances. And we have the same thing for jacquard. So what I'm going to do is an inner join on bray and jacquard and we'll join by a and b. And again, I need to specify a and b. Otherwise, it would try to join on all three columns. And so what this gives me then is a, b, distances x, distances y. And so I need to go ahead and rename distances x and distances y and I can do that with select where I will say bray equals distances x and then jacquard equals distances y. And I forgot to include my a and b. So we'll join that in there. Okay, great. So now we have a, b, bray and jacquard. We're in good shape. I'd also though like to get those days back, right? And so you'll probably remember from back up here, I can go ahead and grab this code that we had for say the jacquard or the bray. So we'll bring that in. So we're going to separate a into animal a and day a, b into animal b and day b. We're going to make those then into factors. And then I'm going to bring in this select. And instead of a and b, I want day a and day b. So now we have day a, day b, bray and jacquard. Great. I'm going to go ahead and grab some of this code. And we will pipe this in here. And for now, I'm going to put my fill equals bray. And this is what we have for the new version of the bray card. And so it looks like the same colors. So that's good. But what, again, I want to do is I want to have bray in the lower left and jacquard in the upper right. So to do that, I am going to go ahead in here and I'm going to add a code to create a new distances column. All right. So I will have this be distances. And I will do a mutate on distances. And that is going to equal and we'll use if else. So if day a is less than day b, so that should be the lower triangle, right? Then we want bray. Otherwise, for now, I'm going to put in zero. And so let's pipe this into our ggplot code. And it's complaining that less than is not meaningful for factors. So that's fine. Let's go ahead and move this mutate line where we're doing the factory order down below where we do this mutate to get our distances. And this should work now. So now we have our bray-curtis distances in the lower left and we have nothingness, distances of zero in the upper right. So what we can do here instead of zero, we could put jacquard. We now have our jacquard distances in the upper triangle and our bray-curtis in the lower. You can tell, again, that those jacquard distances are larger, which means they have a lighter color than the bray-curtis. So what I'd like to do with this heat map now is to modify it to see how we can make it look a little bit more attractive. One of the first things I want to do is change the color gradient from this blue to a red going to white for larger distances. So to change that fill gradient, we'll do scale fill gradient and it will do low and I'm going to give it red. I like to use the hexadecimal because it gives me a bit more precision with my colors. So I'll do ff 000 and then my high will be 6f's. And that's the hexadecimal for white. And so now you can see the reds instead of the blues. I like this a little bit more. Again, if the higher the distance, the lighter the color versus the bluer the color. So I don't know, I like this. Again, part of me showing you scale fill gradient is so that you could go in and you could use whatever gradient you want to use, right? If I want to get rid of that distances, I could also come in here and do name equals null. So again, that gets rid of the name on the legend. Next, I'd like to go in and change the labels on my x and y axis. So I'll do labs on my x, I'll say days, post weaning. And then the same thing for y, days, post weaning. And so that gets me those titles on the x and y axis. That's great. Before I forget, I want to put in a label of what's going on in the two triangles, right? So to do that, I will create a separate data frame that has a position for the x and y position for each of the two labels. And to do that, I'm going to say labels is a tiddle. And then I'm going to have x equals, and let's start with like position three. And that's going to be the lower left. And then for the right, upper right, let's say like 12, we might adjust these as we go along. And then y, I'll do three again for the lower left and then 12 again for the upper right. And I'll do label and I'll say Bray Curtis. And then jacquard. And we can clean this up a little bit. And then down here, after my GM title, I can do GM text and we'll do data equals labels, aesx equals x, y equals y. I don't think I actually need to put in those aesthetics because they're defaults, but I like to leave them in there anyway. And then label equals label. And then we'll do inherit, inherit aes equals false. Good. So that gets us our labels. It's not exactly where I want to put them. It actually centers the label on the coordinate you give it. And so what I'm going to do instead of those x positions is maybe I'll do x five here. So I also want to move the jacquard up into the right. And so maybe I'll make these 16. So that gets them close to their place. As you've seen in the past, I like to output the figure into the format that I'm going to put it in, in this proper size. And then I start working with the size and positioning of all my stuff in my figure. So I'll do gsave and I'll call this Bray jacquard heatmap.png. So I'll do width equals six and height equals four. So that looks pretty nice, right? A couple of things stand out to me that I want to change and improve about the appearance of this figure. So one thing that sticks out to me is that we still have the default theme gray background. You can kind of see that around the edges where you've got this gray border with the grid lines. So let's go ahead and remove that. I also will want to get rid of these tick marks along the way. So when working with these themes, I generally work with a theme that is close to what I want. And so I know to get rid of that background and to get rid of those grid lines, I can go to theme classic. And so I'll add theme classic. That got rid of that gray background with those grid lines. But now I have axes and I have those tick marks. So let's go ahead and get rid of that axes and the tick marks. So I'll do theme axis dot line equals element blank. That got rid of the axes. Now to get rid of the ticks, we'll add axis dot ticks equals element blank. That gets rid of the tick marks. I'm noticing that the font on the x axis is a little bit big that these numbers start to kind of run into each other. Why don't we make them a smidge smaller? And so we will then do axis dot text equals element text. And let's do size equals, I'm not sure what the default is. I think maybe eight. I like the way that looks. I think that's pretty good. One thing that is a little bit odd with these three digit numbers and these single digit numbers, at least on the y axis, is that they're all right justified. I think down here on the x axis, they're all center justified on the tile. So why don't we go ahead and, you know, let's go ahead and center justify those on the y axis. What I'd like to do is just do something like h just as part of axis text. But it turns out that doesn't work. So what we need to do is specify the specific axis that we want to do that on. So we'll do axis dot text dot y element text h just equals 0.5. So h just and v just go from zero to one zero is left justified. One is right point five is right in the middle. A final thing that I'd like to do is to go ahead and make these fonts for my two labels a little bit bigger. And so I can come back up to geom text and I will add a size aesthetic. So size equals and let's start with like 12. So it's got real big. Let's go ahead and knock that down to maybe 10. So I think that font size looks pretty good. It's nice and readable. I don't think it obscures the background colors too much. One of the problems with the heat map is that it asks your audience to give a quantitative interpretation of a color. Right. So if I look at this square, what what distance is that? Well, it's is that like point six or is it like point five? It's really hard to get really good precision. We all perhaps remember that meme from the dresses at blue and black or gold and blue or whatever the colors were right. So the colors around that color you're trying to interpret will affect your interpretation of that color. It's that's ultimately a problem with heat maps. One of the things I take away from this heat map, however, is that I see that again the samples that were from later time periods were more similar to each other. So they're darker color than those from the earlier. Right. So these are more consistent shade of red than up here. But the late are all more similar to each other and the earlier are more similar to each other than the late are to the early. Right. And so this square down here indicates the similarity of the early to late. Interestingly, we have this day zero, which is actually the most similar to the late samples. And again, that was the day of weaning mice are coprophagic, which means they eat each other's feces. And so what I suspect was happening was that they were with their mom, they're eating their mom's feces, as well as, you know, their siblings. And so they had a microbiota that looked like an adult. And so they then go off on their own, which is a perturbation in itself. And then the community changes. And again, the story that we told with these data was that kind of by about day 15 or so, the mouse community had shifted to looking more like this later community. I don't know that this heat map is really the way to show that. But again, it is an alternative to an ordination. I feel like, you know, I don't know if this is necessarily better than an ordination, but you might certainly find that it's an attractive way to depict your distances. Also, we talked about here how we could share two different types of distances. Jakard being based on membership of who's there and Bray Curtis based on who's there and in what abundance. Anyway, I really thank the viewer who recommended this idea of looking at distances on a heat map. Hope you found it useful. Encourage you again, as always, try this with your own data. Give it a shot. Let me know if you've seen something like this out in the wild that you find really compelling. And I would love to see that. And maybe we can comment on that in another future episode. We'll keep practicing. Keep sending me your ideas. And we'll see you next time for another episode of Code Club.