 Okay, today there is a shift in the types of topics. Up to now we have been describing classical tools of data analysis, distances, ordination, canonical ordination essentially with everything that goes with it. And today we will focus on ecological questions today and tomorrow. And of course the statistics will come to our help to better understand how ecosystems are working. But the main focus will be on ecological questions. And today in particular it will be a better diversity that will be the dominant team. And then later Danielle is going to talk about the Mantel correlogram. So, without further, if I can find my PowerPoint presentation, it is somewhere in there. Well, I can probably click here to find it. I have several versions of that talk so I must make sure that I start with the right one. Yeah, this is the one. I hope it is the right one because I have three versions of that. And then, yeah, this is just to show that I also can present a PowerPoint presentation for a change. Actually, that started as a talk that I gave in 2013. And then I have given different versions of that same talk and adding bits and pieces here and there. It is about better diversity and how to partition it in different ways. And at the end of the talk we will see that this focus on better diversity is central to everything we are doing when analyzing multivariate ecological data. And if, of course, the rows of our Y matrix are sites on a geographic map. So this will give a unifying ecological team to everything that we are doing. This is what I am going to do in this first talk. I will remind you what are these notions of alpha, beta and gamma diversities that you must have learned in elementary courses of ecology. But that will allow me to focus on better diversity and show how it is very different from alpha and gamma. Then we will see how to measure it and we will come to this way of measuring total beta diversity and how it can be decomposed into spatial components here and species components. Then there will be an example that is the fish data of the Dew River. And here will be a more mathematical part where we will go back to this similarity matrices. And there will be a landscape called a genetics example. And here there is another final example that I hope you will appreciate. This is Robert Whitaker. He was a professor at Cornell University and he has had major contributions to both ecology and quantitative methods to numerical ecology. It is in his lab that Mark Hill developed the alternating algorithm that is at the base of the Kanuko program. Mark Hill also in his lab developed the diversity numbers that unify all the alpha diversity measurements. And Yu Gauche also wrote one of the first textbooks of numerical ecology all in his lab. So he told us in two famous papers in 1960 and 1972 that alpha diversity is of course local diversity. Local diversity is what you compute for let's say a vegetation quadrat or a water sample for plankton and so on. Gamma diversity is regional diversity. If you add up everything in your data table, here's the Y matrix with sites and species. We can do the same thing with gene frequencies. We can add everything here, the columns and compute alpha diversity. The alpha diversity functions on this sum, we call it gamma diversity if the sites really are representative of the variation in the region of interest and beta diversity, we call it spatial differentiation. It is the variation in species composition among the sites. It varies from site to site. So this may be clearer with the same matrix that I was drawing here where for each site we can measure alpha using any one of these diversity numbers of hill and zero for instance species richness. H1 would be Shannon entropy and N2 would be Simpson's diversity. So if you apply that to the sum, the vector here, then you get gamma diversity. And as I was saying, beta diversity, we say that it is a variation in species composition among sites. To fix ideas, imagine that all the sites have exactly the same species composition like flower boxes that you may find along bullbars at least in my hometown in the summer there are flower boxes and they are all made in exactly the same way by people in the botanical garden. I said in the summer because in the winter we have two meters of snow and the flower boxes have been taken up. And so if all the flower boxes are exactly the same, we may have alpha diversity in each one of these boxes. There may be for instance five species of flowers present. But if they are all the same, there is no beta diversity. So this is to fix the idea that beta is of a different nature than alpha and gamma. You may have a large alpha diversity but zero beta diversity. And we will see that it can be measured in different ways. Okay, there has been a lot of discussion during the past ten years in the literature about these notions how can you measure beta diversity. And it turns out this was pointed out in various papers including this working group here that published the paper led by Marty Anderson that there are two major approaches to beta diversity. People who are working along some gradient, spatial gradient, a transect, temporal gradient, a time series or some environmental gradients like going up the slope of a mountain or choosing sites that represent a gradient in pH. Then if you have your data in this matrix for the different, the rows representing sites along this gradient, then if you compute a distance, one of our ecological distances like percentage difference or Jacquard if you're working with presence-absence data, computing distances between adjacent rows is a way to conceive beta diversity. And then you could plot the value of the distance for different pairs along the gradient. Okay, this is between one and two, between two and three, three and four and so on. And you could say, well, my beta diversity has done that for instance. Okay, so you could look at the variation in these differences along the gradient. This is a nice and valid approach. Now there is another approach that does not imply a specific direction. You can apply it to the whole map where you will be looking for a number that represents the kind of variation there can be across the map. And they are both valid approaches despite the fact that some authors have claimed very strongly with a lot of energy that only that was true beta diversity studies. I disagree with that and it seems now to be the consensus that both approaches are valid. Now in his famous 1972 paper what Whitaker told us, by the way it's interesting that this paper which is one of the foundations of modern community ecology was not published in an ecological journal, it was published in Taxon, a journal of taxonomy, including miracle taxonomy. So in that paper Whitaker described two different ways of computing a single number for beta diversity and he did that on two adjoining pages. Okay, on the left facing page he described this and on the right facing page he described that. And here he said that there were different ways of computing beta and one of these ways was to use the alpha diversities computed for the different sites and the gamma diversity computed from this vector using for instance species richness or Shannon diversity and combining them as follows. S here if you use species richness it would be the number of species in this vector and in the top alpha bar is the mean number of species in the different sites. So this ratio would indicate how many more species are present in the region than at an average site within the region. So that's a good measure of beta. And again some authors have claimed that this was the only true measure of beta but they have not read the next page of Whitaker. So yes indeed it is a very good measure of beta but it depends on alpha and gamma. Now Whitaker then on the next page said that if we want to, well I will show that on the next slide what he said on the next page. What the concept that I'm going to use in this talk is that since beta diversity is the variation among the rows well I would be damned if the variance was not a good measure of variation. So I'm simply going to use the variance of the community data table as a measure of beta diversity. You can use simply the sum of squares of the Y matrix as we were doing yesterday to compute the R square of the, in canonical analysis the denominator was the sum of squares of the Y data table. You center the data, square the values and sum them and that's it. But then if you divide by n minus 1 you obtain the variance and this will be my measure of total beta diversity that I'm going to implement. And I'll show a bit later that this is exactly what Whitaker proposed on the next page of his taxon paper except that he did that through a distance matrix. The advantage of this measure is that it is calculated directly from the data table and it is independent of the measures that you can make of alpha and gamma. So across many studies you can try and correlate this measure of beta to that measure of beta for instance because they are computed in totally different ways. It would be interesting to compare them. Now there are many other indices of beta diversity available in the literature but I will focus on that one because as you will see at the end of this talk it links everything that we have been doing since the beginning of the week ordination, canonical ordination, variation partitioning and everything to the concept of beta diversity because in all these methods we were decomposing the variance of this data table which I claim today is a good measure of beta. Okay that's the main line of this talk. Oh yes I will skip that in recent mathematical development. There is a slide that I had in one of the other presentation that is not here but we will see later what was the idea of Whitaker in computing this measure of variance from a dissimilarity measure. I must have removed that slide for some good reason. Okay now that we have this community data table y it is here again. If we simply center the data and square them so we will obtain a new matrix that I call s and I show it explicitly because we will do some things directly with the s matrix. So it is simply an intermediate step in the calculation of the total variance as we were doing it yesterday that is you center the columns on their mean you square them put them in there and then if you add all these values as we were doing yesterday the sum of all these values is the total sum of squares and dividing by n minus 1 gives you the variance that I call beta. That is fine. Now I will just make a few remarks on this measure of beta before coming back to that table and doing additional calculations from it. The first remark is that of course we know that the Euclidean distance does not correctly represent is inappropriate for to measure the variance in this data table because of the double zeros that may create values that are completely inappropriate when we compute distances. So it is the same thing for the variance and in practice in studies of beta diversity we should not compute these two equations sum of squares and variance directly on the raw species abundance data but we should use the transformations that we discussed on Monday that is the chord transformation where each value is divided by the length or the norm of each row or the Hellinger transformation where each value is divided by the sum of the row and then square root it. These are the two most used transformations before computing this total variance. So I will come back to the properties of these two distances that is the chord distance that we obtained by taking the chord transform data and applying the Euclidean distance and the Hellinger distance obtained by the Hellinger transformation plus Euclidean distance. I will come back to the properties of these two together with 15 other dissimilarity measures, their properties later during this talk. Another note here is that this total value of variance, total beta diversity measure, can be used directly right now without further developments. In particular, if at the same set of sites you measure the diversity of different taxonomic groups, let's say the ascular plants, the insects, the soil invertebrates, whatever group you want to study at the same sites, then you can compare the values of beta diversity among taxonomic groups directly because the measures of beta diversity that we obtained I will show later that they have a minimum and a maximum. For instance, when we measure beta diversity after the chord or Hellinger transformation for a data set, the maximum value of beta that we can obtain if all the sites have a different species composition is one, we have a measure between zero and one. So this is very useful and this is what makes it possible to compare among taxonomic groups because all the measures of beta have the same minimum and maximum. And you can say then rightfully that, I don't know, the color after ends are more beta diverse than the vegetation or more diverse than the soil fauna and so on. So it's very nice to be able to make this comparison. Also for a given taxonomic group you can compare results from different areas for the same reason because the measures are comparable and they are measures of variance where you divide by n minus one so that it doesn't matter that you have a different number of sampling units in area one and in area two. This is taken care of by the division by n minus one and you can compare these numbers of beta for two areas as long as the sampling units are of the same size or represent the same sampling intensity. Of course if you sampled small vegetation quadrat in one area and big vegetation quadrat in the other, you cannot compare these. But if they represent the same sampling effort or same quadrat size then you can compare them among study areas. This is already something very interesting. So what I've done up to now is simply take the Y matrix transform the species composition data, YTR and produce this S matrix where these values are centered and squared. So from that we can compute total sum of squares and total beta. As we go we will fill the rest of this picture. The first thing we can do, well let me come back here it will be more simple, from this matrix since summing all these values give us SS total we can proceed first by summing each row here. And the sum of these row sums is equal to that. So if we now take these row sums and divide them by this SS total we obtain indices that indicate the contribution of the different sites to the total beta. And all these indices sum to one so they are the composition of total beta among the sites. This is what I show here in this slide. I call them local contributions to beta diversity and it is the row sum divided by SS total. For each row i you sum over all the species from this S matrix and divide by SS total. So these LCBD values they turn out to be very interesting indices as I will show in some examples later. They represent the degree of uniqueness of the sampling units in terms of community composition. If this community composition at one site is very different from all the other sites then it will have a large value of LCBD. Now I go back again. If we can sum by rows we can also sum by columns. In the same way the sum by column here will produce indices indicating which species contribute the most to the beta diversity and these values divided by SS total are again indices that sum to one over all the species. And so we have here the species contributions to beta diversity. These are the sums by column. And the species with the highest values have high abundance at a few sites and that creates high variance. So here is a subset of the famous fish data that you have been working with this week. I chose here 11 sites and 7 species with their abbreviations as they were until recently until they were changed to English abbreviations. So these are the data from Verneau at the University of Besançon where Francois Gillette is now working. So Francois dug out the data from the library and rechecked these data recently. So here I chose these sites. Site number one is at the head of the river and as you see there is only one species at that site and it is the brown trout TRU for brown trout and the others are mostly minnows, sacreded fish, are they all? Well maybe not completely all of them. But anyway you see that some sites have very few species also these three sites have very few species but not the same as there and some sites have pretty much all the species. So when we center these columns of data the sites that have an average species composition will have values close to zero and the sites that have extreme distributions like that will the zeros become high negative values after centering and so that will create a row sum that will be then these high negative values when they are squared they become high positive numbers and then this row sum will have a high value because of that. It will be very different from the mean of all the columns. In the same way these here will have high positive values of the LCBD index because of that. So I did the calculations here after chord transformation of the abundance data and I obtained this for total sum of squares and I divided by N minus 1 N being 11, N minus 1 is 10 and I obtained 0.53 as the total beta diversity for this small data table. This index as I mentioned is between 0 and 1 so we have about half of the maximum possible value of total beta for this small data set and here I computed the LCBD local contribution and SCBD instead of showing numbers I show bubbles and large bubbles are larger numbers and we see that these sites have larger numbers larger values than these sites here that are near 0 so it means that after centering their values are very, very close to 0 and these sites have very high values also. As for the species, the brown trout that is present at only a few sites and has many zeros contribute a lot to the total beta and this is the ablet that contributes also highly to total beta but then we will... here we know why this site is very special it is at the top of the river in a very steep environment so only the trout can make it to there but we will wonder why is it that we have high contributions to beta for these three sites we'll discover that a bit later in the meantime I will focus on the LCBD indices here and the nice thing with these indices is that we can subject them to a test of significance that is we can test the hypotheses that the values of LCBD at every site are just the result of random variation and the alternative hypothesis is that these values are much stronger than they should be under the hypothesis of random distribution of each species across the sites so we do that by permuting each column in species abundance in each column separately this is an example of such a permutation here I put stupid numbers but that are easy to follow 0 to 9, 10 to 19 and so on and I show a permutation column by column so I take the values of the first column permute them and here they come in a random order the values of the second column are permuted third column, fourth, fifth column and as you see there is no mixing between the columns it is just the values in each column permuted separately the nice thing with R is that you can obtain the full permutation from this to there with a single line of R look at that we will apply to matrix Y by column the function sample as you know the function sample produces random permutation but here it is done for each column separately so this line of code produces Y perm that is this whole matrix it is very easy to include that in a permutation loop and then you can when you have this you can recompute the values if they have been centered here they are still centered you simply have to square the values and sum them or you can even sum them and square them then after the permutation simply have to sum the rows sum the columns and you accumulate the LCBD values for each site across all permutations and at the end you can compute the p-value for each row it is very simple to do thanks to the R language because of this command here you do not have to do a double loop to produce the permutations ok this is all and nice now what do the LCBDs mean this is an ordination of all the sites of the Dew River except the site number 8 that as usual has been removed and what you see here is that sites that have an average species composition would come out near the center here of the ordination plot which is the centroid of the multivariate distribution and the then sites that are far away they are the most interesting ones it means that they have species composition very different from this average site and in this case in the example the full example that I will present just after this it turns out that site number 1 has a significant LCBD and sites number 23, 24, 25 have significant LCBDs but then after correction for multiple testing these two are not quite significant they remain significant after correction for 29 simultaneous tests are 1 and 23 so the LCBD actually as I will show a bit later they are the square of the distance between the centroid and the position of the site in the principal component space in the multivariate principal component space that's what they are, the square of the distance numbers mean that they are far from the multivariate centroid that is not very interesting it is the average site but the interesting sites are those at the extreme but they can be extreme for totally different reasons for totally different ecological reasons as we will see in a moment okay so what I have done we already had the skeleton here we tested the LCBDs that is the row sums divided by SS total and the SCBDs that columns sums divided by SS total and here these can be tested for significance we cannot test, we have not found a way of testing the species contributions in that context now the full example is of course the fish data from the Dew River available on our site of the numerical ecology with our book and available in the data sets provided for this workshop with the full data set then again after chord transformation I obtained total data diversity of 0.54 with the small sample of 11 sites I had 0.53 so I was not far from that here are some pictures Francois could probably comment better than me on this map this is the Dew River this is the head site here, number 1 and the river flows like this like that like that and finally it flows into the Saun river here that goes south in Lyon it means with the Rhone and they flow together to the Mediterranean so here the brown line is the border between France and Switzerland so on part of its course here the river is the border and our sites are distributed along this river and the city of Besançon where Francois Gidet oh yes this is Francois Gidet by the way the third author of our book I'll tell you what are proof of that is there a picture oh we have a picture of the three of us on the web page as a proof of identity so Francois will address you this afternoon with topics related to our main topics so yes Francois could comment because he lives in Besançon and he has visited all these sites so here I picked some pictures from the web at the source here the flow of the water looks like that this steep slope very rapid waters and this is where we find the brown trout in this area the river flows in a nice agricultural valley it looks like ideal place to take a vacation and so on and in the middle of the city of Besançon you told me this is a canal no this is the river itself this is the river itself it is between concrete banks in the city of Besançon but then outside the city it is not like that ok so this is what it looks like here I made a schematic representation of the flow of the river Besançon is around here and this is the last site number 30 just before the river meets with the Sone river and here are the LCBD values represented by bubbles again so the two that are significant are site 1 and site 23 but we see that sites 2 and 3 also have pretty high values and sites 24 and 25 also have high values and we will wonder why for those of you who know your fish species this is the common bleak albornus albornus that is found there this is the only species I think found at all three sites anyway it is the most abundant species at these three sites while most of the other species have disappeared and here we find of course the brown trout only at site number 1 but then the Foxinus Foxinus and the Stone Loach at these sites number 2 and 3 now in order to determine what was the cause of these high values I took the LCBD values here so we can extract the LCBD values and as you know we have next matrix of environmental variables for the river for the sites in the river so I simply regress that X on all these variables in order to determine what were the most important variables that determine the variation of the LCBD so LCBD becomes a new variable a new synthetic variable of the sites that we can now study with respect to the environmental variables and we know that two sites are significant with this regression I found that LCBD was positively related to the slope of the river bed and of course that is why this site is significant it's because only the brown trout can reach there because the water is too rapid and then biological oxygen demand is also significant indicator of high LCBD values and this is what happens here going back to the data we see that these three sites have high BOD and this is a commonly used indicator of water pollution and indeed in these sites there is agricultural pollution from fertilizers by the agriculture the peasants and these fertilizers flow into the river create pollution they favor the growth of algae that increases the biological oxygen demand lowers the oxygen available in the water and makes the habitat difficult for most species except this one and maybe one or two other species so here we have the explanation of these high LCBD values and for interpretation of the LCBD we see that high LCBD may be indicator of a pristine site that could be included in preservation policy I don't know creating a natural reserve or something like that but LCBD can also indicate sites that have been polluted and are in need of restoration so LCBD is not significant is not synonymous with a good site in ecological terms it can be on the contrary indicator of a bad site in ecological terms so it is up to you then to go back and see why these sites have LCBD values and act accordingly but it is interesting to use these indicators because it tells us that if we have to look only at a few sites in detail it will be this one and these three instead of having to look in detail at all the sites LCBD is a good indicator of the fact that the site is interesting to look at in detail ok now I go back to more statistical concerns ah yes this is the slide that I was looking for earlier so on the right facing page of his 1972 paper which occurred told us that beta diversity could be computed from a dissimilarity matrix and he pointed out that indeed for presence absence data we use for instance the Jacquard and Sovenson coefficient 2 that he mentioned and we also use now also the OKI coefficient computed for pairs of sites and for quantitative community composition data he mentioned the percentage difference he called it the brain critis then and of course the Ellinger chord chi-square distance and many other distances that we will see in a table in a moment that we investigated in this paper so he suggested to take the whole dissimilarity matrix and take the mean of the dissimilarities as a single number index of beta diversity he said we compute this matrix take the mean and that's a good indicator of a good measure of beta diversity ok we will see how this connects to what we now know about dissimilarity measures we we know for instance I think I mentioned that on Monday but if not you will see it here that if the measure is Euclidean distance for instance on transform species abundances then the total sum of squares that we can measure and that leads to the total variance the total sum of squares can be measured directly from the data here transform into matrix S by summing all the values in matrix S can also be computed from the dissimilarity matrix so we can compute it from here and we can compute it also from the distances if the distances are Euclidean distances and this is done by taking the lower triangular of the dissimilarity matrix so we take this portion the portion that R shows us when we look at the dissimilarity matrix we take all the distances in there square them, sum them and divide by M and it's not the number of distances it is the number of sites the number of rows here and this gives exactly the same measure as that it is not an approximation it is exact the composition exact exact way of measuring the total sum of squares when we have the total sum of squares then division by N-1 gives us the total variance which is our measure of beta now, I discuss in a lot of detail the fact that most of the distance measures that we use in ecology including Jacquard, Sorenson the percentage difference and many others as we will see when we take the distances themselves and to make them a collision we take the square root before going to principle coordinate analysis if we do the same thing here and use the matrix D with a sub index of 0.5 in other words the matrix containing the square roots of each distance value we know that for all these coefficients the resulting matrix is Euclidean so if we put these values into that equation we obtain this so it is taking the sum of the distances divided by N to obtain total sum of squares because each of these distances is the square of the square root very simple and then this is what we do we use this equation instead of that one for these coefficients here and this will have some importance in this presentation and in the next presentation also that for all our ecologically meaningful measures of this similarity we will use the square root and this happens to be what we need to do to make them an ideal so oh yes now this looking equation is actually the equation of centering in principle coordinate analysis so I will temporarily quit my PowerPoint presentation and go back to this document where I showed you principle coordinate analysis when was it on Tuesday I think I showed you that principle coordinate analysis consists of taking this similarity matrix this is actually the small example that I had for PCA this small example here 5 rows and 2 columns if you compute the Euclidean distance between these you obtain this distance matrix and here I showed you that we first have to do this transformation 1 half of D square and then we center the resulting matrix A by removing by subtracting the row sum the column sum and adding the overall sum to obtain this matrix called delta in my PowerPoint presentation I call it G because this is known as the Gower centering John Gower again the inventor of principle component principle coordinate analysis so in this this matrix of centered data these diagonal values are very interesting and this was already shown by Gower in 1966 before you were born here are the values that we obtain in the ordination compare these 2 things I would have to show that these values last night I put them in vector these are these diagonal values here and I took the square root of these values and that gives us this so I would need 2 screens to do that properly so these values here the first one is the the distance of site number 1 from the origin and it happens to be this value because in the ordinate it is 0 it is exactly the same value for sites number 2 and 3 the distance is 2.6 like that and you can obtain this value by taking these coordinates and putting it in the Pythagorean formula square each one, add them up and take the square root and for these 2 this is the distance so in other words what we obtain after Gower centering are diagonal values that indicate the distance of each, the square distance of each site from the centroid these are our LCBD indices so it is very simple to obtain LCBD indices from this similarity matrix you simply do this and the Gower centering get the diagonal values and there you are so going back to my powerpoint presentation here is powerpoint going back to the powerpoint presentation this is what I do this is the matrix formulation of Gower centering simply, same thing as removing the row sum the column sum and adding the overall sum you can write it in one line of matrix algebra in this way where this is an identity matrix these are one is a vector of one so and so on so this does exactly the same thing does the Gower centering you extract the diagonal values with the function of G divide by total sum of squares and you obtain the LCBD indices now if you want to test the LCBD indices in a function you have to go back to the data permute the data as we did with the raw data recompute the distance recompute this equation the Gower centering under permutation so the permutation loop takes a bit more work on the part of your computer than working from a linear transform data for instance but it can be done and the function beta div that you will be able to work with this afternoon does that for you permutation test of these LCBD values for any of 21 this similarity measures that are available in the program ok so this is all very nice and it makes us able to compute these indices the total beta diversity and the LCBD indices either from raw transform data or from any one of our commonly used this similarity coefficients and that's a nice thing unfortunately when we compute the dissimilarities we lose the identity of the species so we cannot compute the species contributions there but I think the most interesting coefficients are the local contributions and we can compute them here is that clear enough now the range of values it's easy to show that if there is no variation if there is no variation all the distances among rows when you go from this to a dissimilarity matrix they are all zero there is no variation so the total beta will be zero if each site has different species composition whatever it is this site has these species next site has these species and so on if they are all different then we have the maximum value of beta and by putting these values for each type of coefficient of this distance coefficient into the distance matrix we see that Hellinger and Coord distances have a maximum value of square root of 2 when we studied these coefficients if not they are not between 0 and 1 these two distances Hellinger and Coord are between 0 and square root of 2 1.41, 42, blah blah blah so if you put square root of 2 in this equation here for computing the total variance then you put square root of 2 here and it is squared and you put it in there and you obtain 1 so this is why this coefficient varies between 0 and 1 for Coord and Hellinger on the next page I do the same thing but more quickly for all the distances that have a maximum value of 1 instead of square root of 2 and the same exercise shows that the maximum value is 0.5 now Whitaker was using the mean of all these similarities while in my way of computing the total variance the proper way of reconstructing the total variance is to take the similarities in half of the matrix so of course Whitaker did not know all this algebra back in 1972 and this is why he suggested to take the mean of all these similarities but his instinct was very good, excellent and as a result he was obtaining twice the total variance instead of the total variance as I do with my way of computing it so if you want to reconstruct the total variance we have to work with half of the distance matrix but since his values were twice that of the value then the comparisons were just as valid and this is why I claim that my way of doing the computation of total data being the total variance of the data is quite consistent with what Whitaker was suggesting okay so our calculation summary now includes this portion where we go from y to a similarity matrix and computed from presence-absence data computed from abundance data with one of the coefficients and from that we can compute total sum of squares and we can compute the LCBD indices and we can test them by permuting this, recomputing that and recomputing the LCBD indices let's say a thousand times or ten thousand times now here is a rather difficult subject and we studied properties of the similarities everything that I have shown I will show you the person in a few slides the person with whom I did that young researchers from Catalonia but when we did the first part of that we did it in about two weeks two weeks time when we were working together in China we had been invited to go there and do a project we had an apartment and in the evening we had not much to do and we could have listened to the news in Chinese but instead we decided to work on this and in two weeks time we solved all the mathematics that we have seen before but this part here took us three more months it was a very difficult and long thing to do so I will summarize that part here we studied 14 different properties of the dissimilarity functions I have already mentioned a few properties when we looked at the metric and Euclidean properties of the similarities what is necessary to make a function to be a metric is that it must have a minimum of zero and then positiveness that is the distances must be larger than or equal to zero they cannot be negative they must be symmetric and I think later down in the list I mentioned the triangles inequality thing ok now other properties here among the other properties there is only this one that I discussed in the lecture on distances that is the double zero asymmetric and you know about that Euclidean distance doesn't have that property and this is why it causes problem with ecological or genetic data so it means that the distance should not change when adding double zero but it should change when adding double anything else ok here sites without species in common should have the largest distance yes we should have that all distances have that property so you see that distance that doesn't have that property or that one or any one of those actually should not be used for better diversity studies the distance here does not increase in series of nested species assemblages nested species assemblages means that if you start with a site having three species then the next site has these three plus another one and the next has these four plus two other species and so on if the structure is nested then the distance should not decrease when you compute the similarities pairwise species replication invariance this was included as a mathematical property that is if you have the whole data table and compute the distance then if you double it you should obtain the same similarity this is just a mathematical property of stability in the calculation this may be very important if you are working with biomass data we should have invariance to measurement units that is if you measure your biomass in grams or in kilograms or in pounds if you are in the US or in ancient measures of mass from ancient Egypt we should always obtain the same distance well many of the distance some of the distance functions do not have that for instance the Euclidean distance will produce a different number if your measurement units are different so that is a very important property and the existence of a fixed upper bound is important to have a Dmax to ensure that the measures of beta are comparable among taxonomic groups and among sites well the Euclidean distance again is an example of a distance measure that doesn't have an upper bound to ensure comparability between data sets now there are extra power properties that are not essential according to us for beta diversity studies and it is for instance invariance to the number of species in each sampling unit invariance to the total abundance in each sampling unit and then there are some corrections proposed by the Taiwanese statistician Anchao that correct for under sampling that is they try to estimate the value of the dissimilarity while including the species that have not been observed because of sampling errors it is a very tricky calculation something that a mathematician a good mathematician designed Anchao and these coefficients can be very useful there are three types of these coefficients included in the beta div function that is available this afternoon so only these three coefficients Anchao have this property and this can be useful this solves a sampling issue that small samples miss rare species now there are ordination related properties that can be useful for instance we want to use a coefficient that is Euclidean or where the square root of this distances make the matrix Euclidean in that case ordination by principal coordinate analysis will not produce negative eigenvalues in complex axes and as we saw yesterday when we use principal coordinate analysis as a method of transformation of the data before redundancy analysis we want to use all the axes so this is an important property in these circumstances there are dissimilarity functions that are emulated that are produced by transformation of the data followed by a calculation of Euclidean distance and we saw these four transformations they are those available in the function they are the profile Ellinger, Kord and Chi-square these four can be obtained in this way instead of computing from the distance function okay so we have 14 properties so which of the coefficients have what properties there we go this is the list of coefficients that we investigated there are more than those that I discussed on Tuesday morning on Tuesday I discussed the Euclidean distance of course I did not talk about these two when we talked about transformations Daniel mentioned species profile we have the Ellinger transformation and Kord transformation that leads to these distances Chi-square transformation that leads to Chi-square distance so these four are obtained by transformation plus Euclidean distance then we have all of these and I think we I insisted only on the percentage difference Elias, Brian, Curtis and then there are all these others that are commonly used in ecology some of them are available in Wagen in the Wagen what's it called this Wagen the function of Wagen at least is in there, this one also maybe not the Wittaker's index and Wichard I think he has Consensi and these are the abundance-based functions of An-chao for the unseen species and here are all the properties I did not list here properties 1, 2 and 3 because all the coefficients had them they were not discriminant and then properties 4 to 9 are those that are essential for better diversity studies according to us these are the additional properties and this is the maximum value that can be taken by these coefficients either square root of 2 or 1 or in the case of this chi-square distance it has a maximum value but it is square root of 2 times the total sum of the values in the data table now if we focus on this portion we will eliminate any coefficient that has 0 so we eliminate these 4 and unfortunately to eliminate the chi-square distance that misses this property p5 that was thought important the others in the blue-gray boxes are the coefficients that are admissible for better diversity studies now at some point after writing this table I stood back and looked at it and said this looks like a data table so why don't we do this and indeed this graph tells us something interesting actually it turns out that all the coefficients the points are the coefficients and the arrows are the 14 14 properties that we saw listed and all the coefficients on the left here of the ordinate are those that are eliminated and all those to the right are those that can be used for better diversity studies and they are grouped into one to three types on this side and then two types on that side here we have the Hellinger and Court that can be obtained through transformation, plasticidian distance these are regular coefficients with a percentage difference Kandura, Wushar, Kulsinski all there and the type four are the coefficients of An-chao that correct for the species that we have missed and this tells us what are the coefficients that are admissible and then since then I discovered another coefficient called the Rujiska Discibility which is the quantitative equivalent of the Jacquard Index and I had not heard about that until recently and I checked its properties and it is fully admissible also in the same group as all these others and we will talk about in the next presentation I will talk about that one and that one in more detail but this one is another coefficient that is admissible for the better diversity studies and we have programmed it and included it in the better function that you will be able to use this afternoon ok now what we have seen are that there are we have seen some of the ways of partitioning total beta diversity and I mentioned here some other ways we can partition total beta among species and among sites the SCBD and LCBD and this is new to this talk now we know that we can partition also the total sum of squares using simple ordination or canonical ordination PCA here simple ordination or redundant analysis canonical ordination they are all ways of partitioning the total sum of squares we know that total sum of squares in PCA is partitioned into the eigenvalues in RDA it is partitioned first among the canonical eigenvalues and then the eigenvalues of the residuals but all together they sum to the total better diversity to the total sum of squares in a multivariate analysis of variance that Danielle discussed yesterday using a single factor or two or several cross factor we can partition our total sum of squares again into a portion explained by the factor and then residual or between the two factors and the interaction between them plus the residuals so that's another way of partitioning total beta diversity you have done variation partitioning yesterday and the day before so you know that we can again take total beta diversity and partition it among two or three or four matrices of explanatory variables again this is playing with total beta diversity and tomorrow we will describe the new method of Moran eigenvector maps and associated methods that allow us to partition total beta diversity among different spatial scales we will see that a part of the total beta can be explained by fine scale, middle scale, broad scale plus residual variation and so this will be another way of partitioning beta diversity in other words everything that we have been doing up to now in numerical ecology when our data represent sites in a region of interest everything is our methods of partitioning total beta diversity so it links numerical ecology with this fundamental and central concept of beta diversity in ecology and I think that's also a nice result of this of this research this work has been done with Miquel de Caceres who has spent two years as postdoc in my lab and I worked with him in China on this paper that was published in ecology letters and the PDF is available of course on my web page now I think I will go through this next example immediately landscape ecology landscape genetics example data are from Tomalemi here who did his PhD in Montpellier then he came to my lab as a postdoc and as I was working on this presentation I asked him would you have some data that I could use to show that this same sort of calculation can be done on genetic data and he said oh yes I have my thesis data on this small snail that lives in Pons in Guadeloupe you know Guadeloupe is actually in the Caribbean is actually two islands side by side this one on your left side is a volcano and the other one is coral bed that has been raised actually when the water from the ocean went into the glaciers that are now melting this coral reef emerged and this is where most of the activity in Guadeloupe is and these two islands are very close to one another and there is only a small portion of sea between the two and people there call that the salted river it looks like a river but it is actually the sea between the two islands and are two bridges between these two islands so these ponds are found in the ancient coral bed where ponds have been dug by agriculture by peasants to water the cattle and so on so they try to keep the rain water but some of these ponds during the dry season they dry out also so what happens to the genetic of these snails was the question that Tomolami studied during his PhD and he wrote very interesting articles this one in molecular ecology and another one in the American naturalist about his data that are micro-satellite data here he studied 25 populations in ponds, rivulets and swampy grass for 749 individuals of snails were genotype he looked at 10 micro-satellite loci and then the mean number of alleles per locus was 34 the mean so some loci have 3 or 4 alleles and others have 70 different alleles so it is very high genetic diversity actually and I did transform the data using the genetic core distance and computed total beta diversity which was about 0.20 for a maximum of possible value of 1 and computed the LCBD indices I'll make the story short and here's the map of this eastern island of Guadeloupe yes the salted river is here just adjacent to the capital city of Pointe-à-Pitre and so in these ponds here we have 4 bubbles whose size is proportional to the LCBD value and the 4 bubbles that are in brownish red are significant LCBD indices so they are the most genetically unique population but then why again I had access to environmental data that had been collected by Tomolami and they were pond size vegetation cover connectivity that is that pond connected to other ponds temporal stability it means essentially does the pond retain water during the whole year or does it dry out completely during the dry season and that turned out to be a very important variable here I used a regression tree as we saw during the talk on Wednesday by our colleague so regression tree analysis but it was regression tree for a single variable that was the LCBD values that I was trying to analyze not with the matrix of environmental variable but not using linear regression equation instead I used a regression tree it turned out that the high LCBD values of these four sites were ponds where temporal stability was the lowest that is these sites regularly dry out and connectivity was low with neighboring ponds no connection or nearly no connection preventing migration of snails from adjacent areas so how could snail remain in these ponds from year to year snails some of them can survive in the desiccated pond by a process known as estimating that is they just sit there in the mud during the dry season waiting for a drop of water to come at the end of the dry season and some of them survive but not all of them so every year there is strong high mortality so strong elimination of genes from the gene pool so in the end we have a much reduced gene pool in these ponds and they are replenished they cannot be replenished by snails coming from adjacent ponds because connectivity is low no connection so some snails can move from pond to pond carried by birds for instance that replenishes the gene pool but only a little and this is why these four sites have very reduced gene ponds gene pools and hence they have these high LCBD values so again the LCBDs pointed to the sites where the genetic composition was very special and here I compare the LCBD values to the allylic richness and we see that indeed that these four sites the allylic richness was very reduced that is 13 allylic locus 19 here 9, here 7 much reduced compared to all the other sites ok so we can conclude that beta diversity is the special variation in community or genetic composition among sites it can be estimated in various ways but here I focus on the variants as a general flexible method to compute beta diversity and it can be computed either from the raw transform data or from any dissimilarity matrix that is admissible to this calculation at least now 12 dissimilarity coefficients are appropriate I should correct that in my slide presentation it can be decomposed in various ways and it links this concept links beta diversity to all the methods of beta diversity analysis yeah the first time I gave that talk was when I received a prize the president prize from the Canadian Society of Ecology in evolution and sometimes students wonder what do you receive as a prize, Nobel Prize winners receive a million crores well this is what I received a nice sculpture of a Salman made by Indians from the west coast of Canada ok