 Okay, there we go for the second talk a bit on the same subject that is this this will describe another way of partitioning better diversity using two families of coefficients that we have seen before. So again the beginning is pretty much the same as what we have seen so I will not go in detail through that. So what we were doing during the previous talk here I will only focus on the dissimilarity matrices I will use for the similarity matrices and we know we can compute total sum of squares and total beta diversity from that. So this does it in a very different way. The idea here is to partition the dissimilarities in a dissimilarity matrix like this into two components that are called replacement and richness difference. Now during the past five or six years two different methods have been proposed by opposing groups of scientists. One group is mostly Andres Bazalga and then a few people have joined him in publishing papers with him. He is in Spain and the other group is led by Podani who is a very prolific worker in numerical ecology in Hungary and he has been doing that with the various co-authors. Each group has proposed ways of partitioning. First the Jacquard and Sorensen indices for presence absence data into these components replacement and richness difference and then later the two groups added the quantitative forms of Jacquard and Sorensen. For Jacquard it is this coefficient called Rujiska that I discovered when I started studying in detail these methods and for Sorensen it is the percentage difference. So in this talk I will focus on the Podani approach but the function because I cannot have time to describe everything but the function that is available in the ADE spatial package that I wrote contains the coefficients from both groups. Now there was some sort of fight in the literature going on for couple of years saying my coefficients are better than your coefficients, this sort of thing. So I decided to look into these things and figure out if there was one type of coefficient that was definitely superior to the other. And I found that this was not the case and that there was room for both types of coefficients. I tried to re-establish peace between the two fighting functions with the paper that I published there and by including the coefficients from both groups in one R function that everybody can use and experiment with. Now the basic concept is quite simple. This is for species presence, absence, data. You may remember when we compute the coefficients for presence, absence, data between site, let's say if we put site one here, site two. There with presence, absence, presence, absence and the cells are called A, B, C and D. These are the A, B, C and D components. Well, essentially A, B, C and A, B and C because the D component and the record and serensin and disease are never considered, the double absence, the double zeros, they are never considered and we know why. So here are the components A, B and C. For two sites in which we have a number of species, these five species are present at both sites. They are the A component. B here are, well these are the other species from site one that form the B component and these are not present at site two. And these species are present at site two but not at site one. But they are different species. So when we want to compute this similarity, it will be B plus C divided by some denominator. And in Jacquard, the denominator is A plus B plus C. In serensin it is two A plus B plus C. But these two coefficients differ only in their denominator. They both take B plus C as the numerator for the disciplinary coefficient. Now replacement is the number of species that are found there that you can consider as having replaced the same number of species at the other site. So you take the minimum of B and C and you say this minimum here has been replaced at the other site by other species. So these three or any of those species would do. It does not indicate that these three specific species have a replacement zone. But that three species that are not found here have replaced three species there. And the replacement component would be these three plus these three. So two times the minimum of B and C. For the richness difference, well this site is more rich than that one because it has five more species that have not replaced those. But there is an extra group of five species that form the replacement component. So with that we will construct different indices. I will repeat this picture in the small version at the top of the next few slides. Okay, so the same picture here with the coefficients in written, with the components written in the column for presence, absence, data. Later I will have coefficients for abundance data in this column. But I will use the same presentation during the next few slides. So replacement is two times the minimum of B plus C. It is this thing here. And it can be computed in this way. Richness difference. Here it will be called abundance difference. But the richness difference is B minus C. That is here we have eight species minus three that go into the replacement. So the richness difference is five. And the disseminarity is B plus C. Okay, so this just repeats what we have seen. And now we will construct the coefficients for the Jacquard group. The denominator will be A plus B plus C. So we will apply the denominator to these numerators for the whole disseminarity. This is the numerator, this is the denominator. For replacement, this is the numerator and this is the denominator. This is what we have here. And for richness difference, this is the numerator and this is the denominator. Okay, so simple construction. At least it looks simple in this synthetic presentation. It looked much more complicated from the original literature on this subject. Now same slide, but for the Sørensen group. The only difference is that the denominator is now two ways plus B plus C in every one of these three components. Simple again. Now what do we do when we have abundance data? This same coefficient has to involve now the abundances. And the A component, which is what is common between the two sites, is explained by this example where I have four species and in each subgraph we compare site one and site two. So for this species at site one we have this many and at site two we have this many. So the portion in common is a component of A here, A1. And what is the difference here is a component of C from species one for site two. So C will go into this sum. Species two, we have these abundances with this many in common that will go into the calculation of A there. And here we have something, all these individuals that form the abundances unique to site two. So it will go in this sum for B. For species three it is only present at site one. So B3 is all of that and it will go in B. And then for species four the two species have equal abundances so A will go in the sum there. So we sum all the A's from there, there and there. All the B's from there and there and all the C's there is only that one. And that gives us A, large A, large B, large C that we will now use as measurements of this similarity, that this similarity will be B plus C while A is the similarity portion. And replacement will be two times the minimum of B and C. And the richness, this time abundance difference will be the absolute value of B minus C. And we will put that into coefficients in exactly the same way as before. In the Jacquard group, this was the Jacquard dissimilarity. Now we have the Rogiska dissimilarity constructed in the same way as the Jacquard dissimilarity but with the large A, B and C value there because the denominator is large A plus B plus C. And we will construct the replacement in the same way, two times the minimum of B and C, that is this component divided by the denominator. And the abundance difference is constructed with this numerator and this denominator. Very simple. So this is a new coefficient that I have not introduced yet. Now you will recognize the good old percentage difference, where the denominator is 2A plus B plus C. You obtain exactly the same value as with the formulation that I showed in my talk on Tuesday morning. But Tuesday morning the letters A, B and C meant something different than here. But the calculation result is the same. It is just that the abundances are assembled in different ways. So the percentage, for the percentage difference dissimilarity is computed like this. The replacement component is this divided by the denominator. And the abundance difference is this divided by the denominator. Some of these abundance coefficients have never been described. So I filled the holes that the original author here, Podani, and also for the Baselga group had never described. So I said, well, let's complete the list with all of these. Okay, the fish data, you know about that. And now what we will do is compute the dissimilarity and split it into replacement and richness difference. And by the way, I don't know if I pointed that out. In each case the replacement plus the richness difference amount to the dissimilarity. So this is a true decomposition of each value in the dissimilarity matrix. Hence this presentation where the dissimilarity is split between replacement and richness difference. If you add this matrix plus that one, you find again the dissimilarity. So it is a true decomposition into these two components. And we will see what we can do with it. Here for the fish data, I used the present substance data there. I could have done it also with the abundant data. The total beta computed as we did in the previous talk gives us a value based on jacquard of 0.32 and on certain of 0.27, let's say. Now remember that these are dissimilarities between 0 and 1. So the maximum value that can be attained is 0.5 of these two coefficients for total beta. So this divided by that is 65% of the maximum. And this divided by that is 53% of the maximum. So they are well diversified. Well, 53% is very close to what we had after chord transformation using the abundant data. I think it was 53% or 54%. Now if we look at the components of replacement, if we add all these dissimilarity of these replacement values and divide them by BD total, we find 28%. And it is the same value that we find there. And in the same way for all the richness differences, if we add them up, yes, we can add them up and divide by BD total, we find 72%. So this plus that is equal to 1, showing that these two matrices have really separated or split the total beta diversity into two component matrices. And this shows already an interesting result. That beta diversity in the river is dominated by richness difference, not by replacement. That is as we go down the river, we have difference in richness. That is richness increases as we go down the river, except at these three sides that are polluted. So we have more and more species as we go. And we lose few species. We lose a little bit because it is not purely richness difference. There is a bit of replacement. For instance, the brown trout disappears after the first three sites. But the main phenomenon is addition of species along the course of the river. This I think is an interesting result. Instead of focusing on what happens between pairs of sites, we can have a statement for the whole course of the river. Now how to represent that? I scratched my head for a while to determine how to do that. And here I decided to compare each of the sites. Of course you see here number 8 has been removed from the data because there is no fish that has been cut by Dr. Vernault. And compare each one to the last site, number 30. Because this has nearly all the species. It is nearly the most rich. The most rich is this one, site 29. But anyway, I compared all the other sites to site number 30. And look at the pair-wise dissimilarities. Black is the Jacquard dissimilarity. And the pair-wise then values of replacement and richness difference. So we see again that the curve has dissimilarity. Each time I compare the site to that one. So the dissimilarity is high because there is no species in common between site 1 and site 30. Here there is only the brown trout and here there is no brown trout. So the species composition is totally different also for site 2. Then the dissimilarity decreases a little because there must be one species common between site 3 and 3 and so on. Then it goes up again, becomes more different. And then the dissimilarity falls. And then at the three polluted sites, the similarity increases a lot. And then it becomes low here because these sites are pretty much similar. But we see that the overall dissimilarity is followed very well by the richness difference curve that follows the dissimilarity because this is the main component of the Jacquard dissimilarity. While the replacement portion does its things on its side there. And there is no great contribution to the total diversity. And here for reference I use species richness. That is pretty much the mirror image reverse to the picture of the overall dissimilarity. So that's one way of using these decomposed coefficients, these elements of the richness difference and replacement matrices and dissimilarity matrix in a graph that tells us what happens along the course of the river without focusing on individual species. That is just in terms of overall dissimilarity. Now we can use ordination of course for the circumcision dissimilarity. So it gives us a story like this with site one here at the top of the river. Then for a while we have about the same species composition. Then we go in the long loop here. Then we go to the polluted sites, 23, 24, 25, very different from all the others. Then we go back to some normality in species composition. And then if we compute the LCBDs from the Sorenson matrix, we obtain this with again these three sites having high values and these three sites having high values. So this is not surprising. It is like what we obtain with the quantitative data after chord transformation in the previous talk. Now what is new is that we can do the same thing with the replacement and richness difference components. We can use the whole matrix of replacement and do an ordination by principle coordinate analysis. In the appendices of the paper, there is at some point on page 22 of the appendices, I think. There is a table showing which one of these components produces a matrix that is metric or Euclidean, yes, either in its original form or in the square root form. So with the Sorenson coefficient, I think it is the richness difference that can be Euclidean, but the replacement is never fully Euclidean, even after square root transformation. But still we can do this little story here with site number one for the replacement portion. It goes like this. It goes to the polluted sites and this is what happens in the lower course of the river and this is what happens in the upper course of the river here. And we can compute LCBDs from this. We have this similarity matrix. We can do our centering, take the diagonal values and they are our LCBD indices. And they are just plotted here. I didn't do any test of significance and we see that we have high values at sites 11 to 15 here and high values, of course, in the three polluted sites that are there. So the LCBDs again are the squares of the distances between the centroid and each site and those that are farther than, let's say, this circle, these are the farthest sites, 11 to 15. They have big bubbles and these are far also from the origin, 23, 24, 25. They are there. So again, this shows us where are the sites where high replacement occurs. A funny thing happens with the richness difference. This is a mistake. This is the richness difference, not replacement, but copy and paste on my part. So we can take the metrics of richness difference and do an ordination by principal coordinate analysis and when we do it with the Sörenssohn index or with the quantitative form, the percentage difference, all the points are on an arch like this, a perfect arch. So this is an advantage of using the Sörenssohn or percentage difference compared to the Jacquard and Riziska in the C. Everything is perfectly in line here, but then I added lines to show the sequence from 1 to 2 to 3, 4, after 4, 5 must be here. From 4 we go to 5 here. 5 then come back to 6, 7, 8 is not there. 9 is at the same place. 8 has been removed. 10 to 13, then 14, 15, 16, and so on. And then after site 22, 23 goes there, pollution. So we lose most of the species. Where are 24, 25? Oh, okay, 24 and 25 are here. So fewer species than there, but we gain a few species. And then go back to normal there. So it is an interesting story to follow it there and we have it in schematic form by taking the diagonal values of the centered matrix and these are our LCBD indices showing these three sites. 1, 2, 3 as having big distances and 23 as having a big square distance actually from the centroid. So these four have big bubbles. So these are the interesting sites where richness difference is not smooth along the river but the sequence is violated and we have a very small number of species at these sites. What can we do? Oh yes, now I tried to see if we could do something more than what had been done in previous papers by Podani and Baselga in terms of using these coefficients to test the hypothesis that is to compare these component matrices to explanatory variables. And here the first explanatory variables that I use was to take all the environmental data in the numerical ecology with our book, all the explanatory variables for the river and I computed one clustering hierarchical classification and I took simply two groups a bit like what Daniel did when he determined three groups. Here I did two groups of sites, the upper course and the lower course. You know a rough classification into two groups and I tried to see which one of these components were related to that. So I must explain what I did here because it involves... it is like doing an RDA. In an RDA I would use the Y data against the environmental data. Environmental data X or here my environmental data have become classification into two groups one to 22 and 23. So I cut it here, group one and group two. So this is my explanatory variable. RDA works fine if we can take this similarity matrix and decompose it using principle coordinate analysis and put the principle coordinate into the Y matrix. This is the DB RDA approach that I showed yesterday in one of my concluding slides. So that works fine. But here we have at least one of the matrices that is never even if we take the square root. So there is another approach to doing the equivalent of RDA but from this similarity matrix here. This similarity matrix that can be D or richness difference or replacement. So we take the dissimilarity matrix against this or that and this method developed by Brian McCardell and Marty Anderson in... well, Brian McCardell is... they are both in New Zealand and in 2001 paper they developed an F test of significance of the RDA computed from this but it is exactly the same as if we were decomposing it using principle coordinate analysis and doing a normal RDA. But it can be done even if the matrix is not Euclidean. That's why we keep it in that form. So the mathematics is interesting and I think I have included that function into ADE spatial to die. Anyway, now it is in vegan. I sent it to Gary Axanen and he included that in vegan. So this is to explain that it was... the F test was done in a mathematically different way but it is the same as the F test of RDA. So I found that the whole distance, the Jacquard dissimilarity matrix, here I used the Jacquard. Yes? Yes, the Jacquard. It was significantly explained by the classification but replacement was significantly explained also by the classification but more significantly than that whereas richness difference was not significantly explained by the classification. So D contains this plus that so the component that is really related to the classification of the river in the two sections is the replacement component, not the richness difference component and you see that this P value is more significant than that one. So this is really the result that should be interpreted. Now I kept going and used the whole matrix of environmental variables. I must have done forward selection. So yeah, here I did that after transforming these data using principal coordinate analysis and four forward selection but then I did the test here using the MacArthur-Landerson test and found that the whole dissimilarity was explained by slope, hardness, nitrate and oxygen concentration. Replacement was explained by variation of oxygen concentration and richness difference was explained by the three other components found there. So that was, I thought, a very interesting result because splitting the dissimilarity into two components showed that the explanatory variables that were significant there were also split into two groups and related to these two different components that are orthogonal components in the story. So that is extremely interesting. It goes deeper than just analyzing the dissimilarity matrix. Here we really touch the ecology, what happens to the community composition data in terms of these two processes and how it is related to the environmental variables. Well, at least I like that result. In the original paper, there are other graphical tricks, methods of analysis that have been proposed by Podani and Chmera and by Podani and co-authors and I use them, for example, these triangular plots in the analysis of the two river data that I used in the paper that was published in the Global College in biogeography two years ago. Okay, so the conclusion of that part is that replacement and richness, difference in disease can be interpreted and related to ecosystem processes separately, as we saw in this example. And the innovation of this decomposition was that the index values can be summed across all pairs of sites to decompose total data diversity into total replacement and total richness, difference components, previous authors have not done that. So because of the mathematics, that was so simple. I thought if we can do the sum of the dissimilarities to obtain the total variance for these indices, remember these are indices that require to be square-routed to be Euclidean. So the total variance is the sum of the dissimilarities divided by n and then by n minus one. And each dissimilarity is the sum of component of richness, difference plus replacement. So these two other matrices can also be summed in the same way and divided by n times n minus one to obtain the total variance. And these sums of the replacement and sum of the richness difference component add up to the total data diversity. So that's another way of fully decomposing total data instead of only doing it side by side. Local contributions, the LCBD indices, can be done computed for these two components separately and they can be mapped as we have seen. Within a region differences among the sites measured by these indices can be analyzed and interpreted using explanatory variables, either a classification or a series of environmental variables as we have seen. Replacement and richness difference matrices can be analyzed by all methods of multivariate data analysis that are appropriate for dissimilarity matrices. And do I have anything else? Yeah, well, the reference. Okay, questions about that? I realize that this is new to you unless you had seen the papers and were already in this sort of literature, but for most of you this may be new. So if you don't have questions now, maybe you will have questions this afternoon where there are exercises where you can play with the function. There are examples at the end of the function and I will encourage you to run them and look at what you obtain and maybe you can understand more fully how useful this method of decomposition of the total beta diversity can be for your analysis. I hope it will be useful to some of you. And with the new package, ADE spatial, everything is available in there.