 Okay, please raise your hand those who have used Mantel tests in the past. Many people usually, only two, three, that's all. Because Mantel tests in some labs is the main statistical test for all sorts of things. Especially the people working in labs of genetics. Because their research director has learned to use Mantel tests when he or she was a student back in the 1980s or something like that. When Robert Sokol made the Mantel test popular among geneticists. And so I'm surprised that so few of you have used it. Now those who have used it, have you used it as a form of spatial analysis to test for variation in community composition or genetic data against space. No, you have not. Well, maybe I should not give that talk. Because this talk aims at showing, oh, this is the PDF, this is not what I want. Of course, what did I do with the main talk? Sorry about that. I cannot do a proper presentation with this. Mantel, this is the PDF, oh yes. This is the short, supposedly the short version of it. Now I can do the projection, but there is still a lot of material to be seen. So let's say that I'm talking to people who are not using the Mantel test. But you might use the Mantel test if you go back to your lab and you have the community composition or genetic data and you want to analyze spatial structure. And as I said in many labs, they use Mantel tests. So this talk is the result of a paper that we recently published with Daniel and Maggie Jose Fartin from University of Toronto. And the purpose is to show that the Mantel test should not be used for spatial analysis in ecology and genetics. And it is the title contains an interrogation mark. And actually this is another formulation that was proposed by Pedro Pérez Neto during a workshop on spatial analysis. And he became fairly furious at some point. And he said, what can we do to stop them from using the Mantel test? So I use this as a subtitle of this talk. I will briefly review the Mantel test that Daniel has already presented yesterday. Then I will examine some of the statistical basics of the Mantel test. What's the null hypothesis of the Mantel test and is this the null hypothesis that we want to test? We will look at different types of R-square statistics and sums of squares comparing the Mantel tests to multiple regression, for instance. And we will see that there are two types of sum of squares again between Mantel test and simple regression. And I will offer a simple example. Then we will go a bit more deeply and look at some basic assumptions of the Mantel test and see if these assumptions are met by data across space. Data coming from a geographic map. And then I will show you two series of simulation results. And I will finish with the real case study in population genetics. So you have already seen that in Daniel's talk in the Mantel test. We have two distance matrices. And in the case that concerns us here, one is computed from community composition data or genetic data. And the other one is geographic distance metrics. And so in many labs people just use the Mantel test where we string out the upper or lower diagonal of the distance matrix into a long vector, same thing for the other one. We compute a cross product or a correlation coefficient. The end result is the same. And so this is the Mantel statistic. And then we test it by permuting the rows or columns the rows and columns of one of the two matrices which is the same thing as permuting the rows of the raw data and recomputing the distance matrix. So we can do the same thing using what is called metric permutation that is permuting the order of the rows and columns simultaneously. And then you do that a large number of times to test your Mantel statistic. It is very simple and easy to do. You simply have to call the Mantel function there are several in R. You give two matrices, set the number of permutation and run it and you obtain a p-value. So this is one aspect that makes it the simplicity of use that makes it so popular. But then in this talk I will try to show that it does not give correct results if you apply it to spatial data. Daniel already mentioned yesterday that this test was developed by Natan Mantel to study epidemics. And he was interested in studying relationship between the geographic and temporal distances in the events that occurred during an epidemics. In fact he was particularly interested in leukemia and this is the reference. This is a very highly cited paper because many people use the Mantel test perhaps correctly in some cases but mostly incorrectly because the Mantel test was not designed to analyze community data against space. It was designed to analyze geographic and temporal distances in epidemics. It is Robert Schochel who discovered the Mantel test and started applying it to genetic data in this context and I understand why. Because at that time it was the only possibility, the only way that we have found to include geographic relationships in a test of significance on data. Geographic relationships naturally came in the form of distances between points. So he said since we have distances between the points let's turn our genetic in this case or community data in my case into a distance matrix and use a Mantel test. At least we can compare the community variation to the geographic structure except that nobody had done the investigation that we just published and that shows that it doesn't work. Now let's look at the null hypothesis. The null hypothesis of the Mantel test is not the same as that of the correlation between two variables. Indeed the Mantel test is testing H0 which is the absence of relationship between the dissimilarity values in the two distance matrices. That's all we have in the two distance matrices that you string out and you test that there is no relationship between the distances. Whereas if you have two vectors, two simple variables and you test their correlation, H0 is the absence of correlation between the two variables. It can be done with two multivariate data tables. François-Gilet showed yesterday how you can do that with the RV coefficient to a data table. You compute an RV coefficient between the two. So again, the null hypothesis in the case of the RV coefficient is the absence of a relationship between the two data tables. And I'm just saying that the null hypothesis is not the same because here it concerns dissimilarity. But maybe it doesn't matter. So we will see a little bit further. But at least you can see that right from the base there is this difference that may be a small difference. Now when we compute r squared in regular regression or in the Mantel test, the test computes an r between the two dissimilarity matrices. If you square it, it is an r squared. So when we compute the r squared of multiple regression, we have seen this equation a number of times when I talked about regression and when I talked about canonical analysis. It is the sum of square of the fitted values divided by the sum of square of the original data. That's the r squared. And we know that I will focus on the denominator here. It is simply the sum of square of the centered data like this, sum over all the species and all the sites. And I have shown that it can also be computed using from a distance metric derived from y and in this distance metric if you take all the distances in the lower triangular, square them and sum them divided by n, you obtain the exact same thing. So it is true that we can obtain this value from the distance metrics. I have no question about that. However, when we do a Mantel test, the square of the Mantel correlation, which is also an r square, between if you regressed one of the stringed out distance matrices on the other, you would obtain an r square, which is the square of the Mantel correlation. It is also constructed like that if you do this regression, but then here you have the sum of square of the distances, not the sum of square of the y's. Does that make a difference? It doesn't make a difference. Let us look at the equation more deeply. The sum of square of the distances, if you string out the distances, you take the mean of the distances and subtract the mean and square them. That's the sum of square of the distances. Now, you can reword this equation in a simple way that we teach in basic biostat scores. It becomes the sum of the square distances, minus the sum of the distances squared divided by n times n minus 1 divided by 2. This is exactly equal to that. This portion is that, no problem. It is not divided by n minus 1, but then you have this portion here that is not there, and that changes the value completely. So there is no way of working on this equation to produce that. There are two different statistical quantities. So the sum of the r-square of regression is not the r-square of the Mantel test. There are two different things, because here I worked with the denominator, and of course, if we worked with the numerator, we would also obtain this sort of difference. There are two different quantities. Just a small example to materialize this. If I consider the numbers 1 to 10, I take the numbers 1 to 10, and I subtract the mean, square the values, and sum them. I will obtain 82.5. No problem. Now, if I take the numbers 1 to 10 and compute the distance matrix between them and take the values in the lower triangular or the upper triangular, either one, the same thing, and so I have all these distances here, string them out in a long vector of distances, subtract the mean, square them, I obtain 220. So you see the number is not the same. Not only the equation is different, but at the end it makes a big difference. So this is just to convince you that we are not computing the same quantity. The R-square of the Mantel test is not the R-square of multiple regression. Okay? Here's another example that I made up for the occasion. This looks very much like data that we might want to analyze. Let's say that this is the abundance of a specie, and these are three environmental variables. Given this sort of data, you might immediately, you would run multiple regression to see if these variables explain the variation there. When I fabricated these data, I created three columns of random normal deviates with the R norm function, and I added them and added another vector of random data to make noise in the data. And then I moved everything so that it became positive to look like species abundances. So this, by construction, is related to these three variables. So it is not surprising that in the results that I will show here, there will be a relationship. I built it in because we want to see... Yeah, because of the construction, why is it related to these environmental variables? The data on the map might look like this. I took these values and represented them by bubbles, and I chose random coordinates to plot them on the map. Just to show that this is the sort of data that we are all handling. It is very much like that. So if we do the analysis, here are the analysis. If I use multiple regression with the ln function, I obtain an R squared of 0.57. Of course, I built the relation in. So there is a relationship. Adjusted R squared is pretty high, and I have a significant relationship even though the number of observations is small. Now, if I run a Mantel test, that is, I compute a distance matrix from this and other distance matrix from these three variables using Euclidean distance. I run the Mantel test. I obtain this value for the Mantel R squared statistic, the square of the Mantel R, very small. There is no adjusted R squared in Mantel tests. Mantel tests receives a distance matrix. It doesn't know how many explanatory variables there were. So there is no adjusted R squared. And here, this result would happen to be not significant. So given these two results and given the way the data were constructed, we know that this is the correct answer. That one is incorrect, or it may just be that the test lacks power. But in any case, if we go back to what we saw in terms of the null hypothesis that we are entertaining, are we interested in analyzing the variation of the data or are we interested in analyzing the variation of the distances? The null hypothesis about the variation of the distances is the hypothesis of the Mantel test. While the null hypothesis about the variation of the raw data is the null hypothesis of the regression. I think what we want to analyze is the variation of the data, not the variation of the distances, which is a derived quantity, right? Okay. Yeah, so which R squared corresponds to the question stated here best? I think it is this R squared here, not that one. When we are analyzing multivariate data, it's the same thing if we have multivariate data about sites and species or gene frequencies with environmental data and spatial data, what do we want to know? Do we want to explain the community variation among the sites, or do we want to explain the distances of the community variation? I think our interest is to explain the community variation, how it varies among sites using these explanatory variables. So in this case, instead of multiple regression, it is the RDA or partial RDA that brings the correct answer to our question. Here is a small example that I published in the paper in 2005. This is the sort of data that we may very well have. There are four sites and five species. One species is common to all sites, and then each site we have one different species. There could be many different species. It's just the idea of having different things at the different sites with one species in common. Okay, so if we look at the sum of square of these data, we could do it after Hellinger transformation if you like. Anyway, we will obtain some value for the sum of square of the data, meaning that there is some variation among the ropes. Obviously, there is variation among the ropes. Nobody would say that there is no variation, right? Now, this is the Jacquard distance or this similarity, that is one minus Jacquard similarity. Between every pair of ropes, the difference is the same, okay? Because there is always one species common, and then the difference is one extra species outside one and one extra species outside two for all combinations. So we obtain this sort of distance matrix where all the values are the same. If we can compute the sum of square of the distances in the upper or lower triangle, it's zero. Does that reflect what we see here? So there is a problem with this approach. Of course, I could also compute the sum of square from this similarity matrix, in which case it would be the sum of the square distances divided by four. And I would obtain this value that differs from that one, simply because here I use sum of square on the raw data and here the sum of squares from the Jacquard distance, but they are both different from zero. So there is a big problem there that we obtain definitely the wrong answer from the sum of square of the distances, right? Now, I will look at the basic assumptions of the Mantel test in spatial analysis, do these assumptions hold? The Mantel test will compare to this similarity matrices as Danielle has shown and as I have shown in one of my first slides. So we have the two, the similarity vectors coming from the two, the similarity matrices, d1, d2, remove these numbers that are not what we are looking at. So this is d1, d2. So we could use these two vectors to plot a graph, but d1 against d2, okay? Well, if this is the response data, we would plot d1 there and if this is geography, we would plot geographic distances there. The Mantel test is simply a linear correlation or we could use a Spearman correlation which is for monotonic relationships between the data. These two types of correlation will work best if the distribution of the distances is something like that. If it is fairly linear and if it is fairly almost cadastic, that is if the residuals of a regression line here would have normal residuals at the end. So this is what we are hoping to have. Is this what we have with spatially distributed data? Well, I used a simulation because I will use simulations where I created spatially autocorrelated data and the advantage of simulation is that we know exactly what we have put in. Well, when we receive a pile of data from the field, we don't know if the data are correlated or what. So here I will use simulations with spatial autocorrelation of increasing ranges. So that's the advantage of simulation, we know what we are doing. And we will see if these two assumptions, linearity and almost cadasticity, are verified in the type of data, in the simulated data that look like data that ecologists and geneticists are doing. In these simulations, actually, it uses a function designed by geostatisticians that are based on a variogram. A variogram is like a correlogram that Daniel described yesterday with geographic distance here divided in distance classes and a function that is not more than i, but that is often called gamma. It is the semi-variance of the data. And a variogram with a spherical model looks like this. This is the relationship between the semi-variance that increases up to a value that we call the range and the maximum value obtained is called the self in geostatistics. There is also an intercept that can be not zero that is called the nugget effect, but I did not put any nugget effect in these simulations. I will show you simulations. We can use the variogram to analyze real data, but then in the function that I will use, we can set the parameters of the variogram and the function will generate data that obeys this variogram, that is data such that if we analyze them, we would find this variogram. It is a tricky thing, but geostatisticians know how to do that and these functions are now available in R. Here is a small example of using such a function on a map that has size 20 by 20 in which I have 400 small cells, small points, and here I use a variogram with a range of 10, that is autocorrelation goes up to distance 10. After that, there is no spatial correlation anymore. This actually creates patches of high values to follow the Swiss interpretation of the pale value and low values, patches of low values and patches of high values. It could be the reverse. We see that the patches are above the size of the range given there. The diameter of the patches is about 10 and here too. In the next slides, I will vary that amount and see what happens there. This is the graph that we obtain when we draw one set of distances against the other. Here we have the geographic distances and it goes up to 25 because there are diagonals also. It is larger. The maximum geographic distance is more than 20. It is about 26 in this graph. This is the difference in the values that were generated. There was only one variable, but you can still compute the difference between this and the neighboring cell in distance class 1 or between this and that that will give a bigger difference for some larger distance class like 10th year. With 400 cells, we already have a lot of comparisons because it is 400 times 399 divided by 2. So the 200 times 400, 6, 0, 0, 0, 0, 0, 8, 0, 0, 0. So is that it? 400 divided by 2 is 200 times 199. So let's say that it is 200 by 200. It will be 4, 0, 0, 0, 0 which is 40,000 points in this graph. There are so many points that it is difficult to see clearly the relationship. So I computed a smoother, which is the lowest function here, to show the central tendency. But this is rather difficult to see and very long to plot. Even in R, it has to plot 40,000 points. It is not very useful. So I reorganized the distances produced into distance classes for which... Yeah, one distance class for each integer, I think. And then for each distance class, I computed the mean in blue and the median here, the black square. And actually, the mean is a good approximation of the central tendency, just a little bit too optimistic compared to the median. So in the next slide, I will show you this and that. I will not show this. OK, and we see here already... Oh yes, this is for a small map of 20 by 20. In the next slide, I will use maps of 56 by 56 because this is what I will need in the simulations that I will show later. But just to set the ideas, in the larger map of 56 by 56, I will tell you why I have these funny values. Here, if I have a range of zero, then my variogram is flat and the data are simply random normal deviants. So there is no patch of any size. Any point can be neighbor to a point with similar value or points with completely different value. There is no spatial structure. And in my little graphs here, the distances fall anywhere and the mean is flat. So we don't have a linear relationship, but that's fine. There is nothing to show. Next slide here. I have a range of 10 in this map. So my patches have a diameter of about 10. The pale patches or the reddish patches, we have small patches all over the place, but they are located at random. This function produces a map of random values that are autocorrelated. If you produce another one, you also have patches of the same size, but they will be somewhere else. They're completely random. So here we have increasing values at the beginning of the distance classes, but certainly it is not linear because it's flattened here and certainly it is not almost cadastic. The variance here is much different from the variance there. So we violate the two assumptions that would make perhaps the Mantel test usable. This is for a larger range of 30, larger patches of red, larger patches of pale colors, and we see that it is not linear and it is not almost cadastic. I like very much this one. This is with a range of 60, larger patches of red and pale yellow, and this shape actually reminds me of the story of the little prince when the little prince asked the pilot to draw a sheep, and here the sheep has been eaten by the snake, so it looks something like that. It has the shape maybe of a mammoth, and certainly the relationship is not linear and it is not almost cadastic. If we go to larger values where the range is larger than the largest distance that can be accommodated on this map, then it becomes more linear, but it is still not almost cadastic. Actually, I generated tons of those and it is never linear, except when the range is larger than the size of the map, but it is certainly never almost cadastic. I think that this is one of the main reasons why the Mantel test has low power as I will show in the simulations that will come. It is the violation of these two basic assumptions. So now I will show some results of simulations that we did for this paper. They are univariate simulations, simulations of data with the variogram as we have done here, but I did all the intermediate values of a range, and I repeated that I think a thousand times for each situation, generating new maps and recomputing the Mantel test, but also a multiple regression against MEM eigenfunctions, dbMEM eigenfunctions. So I had to wait that we had presented dbMEM eigenfunctions before I could present these results. So sometimes I give this talk to people who don't know about MEMs and I only tell them a few words about them and I say it can be presented in another seminar, but here you have had that. So we will simulate spatially correlated data on this 56 by 56 grid, and I will sample 100 points on the grid forming a regular 10 by 10 grid but with spacing of 5. So in the x direction I need to have 10 points with spacing of 5 that makes 46, the number of cells that you need to have 10 points with spacing of 5 in between. And around the points that I will use I simulated an extra band all around of 5 pixels wide and this brings the size in this way and that way of the map to 56 for that reason because I wanted to accommodate 100 sample points with spacing of 5. And I generated maps with variograms with the spatial autocorrelation with range if you like of 0, 5, 10 and so on up to 40 units and I did a thousand independent simulation for each value, each of these values and I computed multiple regression and extracted r square, adjusted r square and p value for each simulation for each of these values and I computed the Mantel test of the distances computed from the response data the simulated response data against geographic distance and also against the square root of the geographic distance because sometimes people use the square root in order to linearize yes linearize the d by d graph a little bit so here in these two cases I recuperated the Mantel r that I squared and the p value, there's no adjusted r square there and we will look at the rejection rate of the null hypothesis in each case, in each simulation that is how many times did we reject the null hypothesis for these data divided by the number of simulations how many times, that gives me a rate of rejection and if the test is honest it should reject at 0.05 when the null hypothesis is true but here the null hypothesis is not true because I generate data that have a special structure so I am expecting the rejection rate to be higher than 0.05 with the data that I will show you and then we will look at the r square for each method and what does it tell us this is the two graphs that summarize all these simulations that took several weeks to plan develop, try and correct and finally get the final results here we have the results for the regression against MEM the results for the Mantel test against the square of the geographic distance and the Mantel test against the geographic distance these are the values of spatial autocorrelation that I injected in the data but because my data points have a spacing of 5 then when the range is 0 or when the range is 5 I do not expect to have any effect of spatial correlation on neighboring points because they are too far away they are distant by 5 so when the range is 5 this is the distance where no spatial autocorrelation exists in the data and this is what I obtain indeed so if we look at the result of the regression against MEM here there is nothing that is we have exactly the rejection rate of the test because each test is done against a significance level of 0.05 so the rejection is equal to the rejection level of 0.05 fine and then as soon as we have more autocorrelation than that the regression jumps up to power that is to a number of rejections that is almost in all cases and here it is in all cases we reject the null hypothesis as we should now with the Mantel test against geographic distances we have this so we never have the correct results although when the spatial autocorrelation is very high we approach the correct rejection rate but it is always smaller than that of regression and even with this correction of the geographic distances we have the same story well it is dramatic you know with the data that have let's say spatial autocorrelation of 15 on this map which is pretty big spatial autocorrelation massive spatial structure we only find it in 40% of the cases with the Mantel test well we should always find it now if we look at the R-square the R-square is depicted here in green for regression but of course we know that R-square is an overestimation of the relationship so the correct statistic to use is the adjusted R-square which is here again it is zero up to this value of spatial autocorrelation but as soon as spatial autocorrelation has a range larger than the distance between the points we have pretty good R-squares now for the Mantel test it never moves much higher than zero so the R-square of the Mantel test cannot be interpreted as the amount of variation in the data explained by the spatial structure it is nearly zero all the way and this is summarized here oh yes in the paper if you're interested go and see the paper we did other simulations with other methods that are used in landscape ecology and this was a suggestion of Mali-Rose-Fortin who is a landscape ecologist and we used the Launay triangulation truncated distance matrices and this sort of thing and the results were nearly identical to what I showed you so you will find them in the paper the result is that the power of the Mantel test was always lower than that of spatial analysis using Moran eigenvector math and the simulation also showed that the Mantel R-square was much smaller than the R-square produced by MEM analysis and was not that interpretable this is a sad story but I mean somebody has to ring the bell and say hey wake up don't use the Mantel test for spatial analysis how are we doing with time yeah maybe I can show you simulations that we published also together in another paper in 2005 paper written with Daniel and with Pedro Perez at the time he was in our lab so again we simulated in that case spatial autocorrelated data but they were multivariate the previous results we had a single variable here we had generated multivariate data that were like species abundances and again we compared the results of RDA this time because the data were multivariate to the Mantel test and for canonical analysis we used everything in the book we used the coordinates XY coordinates of the sample points we used the qubit polynomial that Daniel described when he talked about polynomial regression and that we had used for variation partitioning in our 1992 paper and then we used MEM spatial eigenfunctions formally called PCNM because in the graphs that I will show you in 2005 we were still calling them PCNM for the Mantel test we used geographic distance matrix based on the coordinates the distance based on the polynomial the same polynomial as there and then the log of the geographic distance this is another method often seen in the literature this is the way the data were generated essentially it was also on a grid after generating specially autocorrelated data so I skip the details this is the model and also yes I generated also environmental variables and I generated 10 species 5 of the species could be linked to the environmental variables it could be a linear function of the environmental variables and the last 5 species were never linked to the environmental variables they were only specially correlated but then in some simulations that I will show you even the first 5 species were not linked to the environmental variables so this is the main table of results that I wanted to show you we will look at first at this one where the species were not related to the environmental variables the first 5 species the last 5 were never related but the species were autocorrelated so you recognize this Venn diagram this is the total variation of my 10 species against tested against the environmental variables let's see where I think the environmental variables were specially correlated yeah, yeah, yeah, yeah and then this is what is explained by the spatial portion the first 3 lines are the results of the RDA the last 3 lines are the results of the Mantel test where the calculations are done by partial regression on the distance matrices so many people have used this sort of partitioning on distance matrices although we have never designed it like that but there are functions in the literature and people have been doing that is that correct? we'll see here what happens first I'll look at the A plus B portion that is this portion here we see that in these simulations the species were not related to the environmental variables so we should find a relationship only by chance all the tests were done at the 5% significance level and indeed in all cases we find about 5% rejection each of these results each of these values is the result of a thousand simulations so this is fine it means that both the RDA and the Mantel tests have correct type 1 error well, we already knew that but it is confirmed here it means that we did not make any big mistakes in our simulations now, you can slide this is going a bit too I have highlighted another column which is the B plus C portion the species were not related to the environmental variables but they were autocorrelated so do we detect that? so here in the RDA against the X and Y coordinates we detect the significant spatial structure in 20% of the cases with our 1992 polynomial analysis 40% of the cases but with the MEM Eigenfunctions in nearly all cases what about the Mantel tests? against the distance matrix of the XY in 11% of the cases against the polynomial 7% against the log of the geographic distance 17% of the cases so we know that there is autocorrelation in the data and the Mantel tests nearly never detects it that's tragic now in the case where the species are related to the environmental variables and this other set of simulations with the big regression coefficient well, here variation partitioning using the environmental variables nearly always detects close to 100% of the cases detects the A plus B portion that is that there is a relationship with the environmental variables the Mantel tests detects it in about 28% of the cases again we know there is such a relationship because we built it in and Mantel tests does not detect it this can be tragic as I will show in a later example but if you have analyzed your data collected with great pain and a lot of time using a Mantel test it means that you are unlikely to detect the signal present in the data so at the end of the day you will have nothing to publish so this summarizes what I just said the new simulation results show that the power of the Mantel test is always much lower than that of canonical analysis so the spatial variation is at best weakly captured by direct relation of a response distance matrix on geographic distance none of the transformation of the distances that we tried to increase the performance of the Mantel test and on the RDA side representation of the spatial relationship by MEM is the best it is much better than XY coordinates or the polynomial that's what we are showing here ok now last part of my presentation about this real case study in population genetics this story here this research is about the Lyme disease the Lyme disease as you may know is a parasite transmitted by a mite that is carried by mammals and in particular in our region of the world in the northeastern United States and southern Quebec it is the deer and the white-footed mouse that are the main vectors that are the main animals that carry the mite and the mite may transmit the Lyme disease and when the mite is ready to reproduce then it leaves the host that carries it somehow climbs in a tree waits for another animal to pass by and if it is a human falls on you you don't feel it it sucks your blood and it may inject you with the parasite and you get Lyme disease so it is a big concern Lyme disease was limited to northeastern United States but was not present in Quebec 30 or 40 or 50 years ago but now with climate warming the mice that carry it go further and further north and the acarians come with it and they survive our winter now and they are more of a threat so there is a big money put into the research on the dispersal of the Lyme disease carried by the white-footed mouse here this research was done in the lab of Virginie Millian at McGill University and Virginie is doing a lot of research on that and this part of the research was done by a master's student that was working in his lab and her task was to find whether or not the the artificial natural and artificial barriers in the landscape limited the dispersal of the white-footed mouse so it is a study of the population genetics of the white-footed mouse that lives in different patches and moves around the territory potentially carrying the Lyme disease and here we will her task was to study the effect of two natural barriers that is two rivers and two man-made barriers that is two highways across the landscape I'll show them to you in a map in a moment now important is the fact that Anita Rogic was a student of Virginie Millian with the co-direction by Francois-Joseph Appoint a colleague at my university at my department a good friend and we visit each other all the time that's an important aspect of the story that I'm going to tell you so this is a map of the area you have the United States about somewhere here yeah this is the head of Lake Mamfremagog so this is probably the border of the U.S. here and you have the island of Montreal and here are the populations of mice that were studied and you see these are separated from those by this large river Richelieu River that comes from the from the U.S. well it connects with the with the main river that flows to New York actually before we had the planes and highways there were boats coming from New York to here connecting to locks this Richelieu River and bringing people in the Montreal area so yes this is the main river separating these populations and this is a bit smaller river the which one is it again separating these other populations and then there are also highways this highway and that highway that separates for instance these two populations from those or that population from those so the idea was to test the influence of these rivers and and highways yes special barriers roads can be represented oh yes I will come to that in a moment then one day I walked into the lab of my friend François-Joseph La Point and there was an etiologic the student that I had never met before and François-Joseph was sitting in front of a computer screen with her looking at the results that she had obtained using Mantel test and François-Joseph called me and said come and see that can you suggest a solution and Nita has been working in the field for one year doing the genetic analysis in the lab everything now she computed Mantel test and there is nothing significant oh that's a problem because at McGill like in our university a student even a master's student has to publish a paper and Nita had nothing to report in your paper because you will never manage to publish a paper saying have no no significant results this is the p-value bias in the publication negative results are very difficult to publish so she was stuck which she had to do another project altogether spend another year in the lab and in the field in order to get her master's degree so I look at that and said how did you analyze your data she showed me that she had no significant result she said I use Mantel test I said well perhaps you can do something better let's try by representing by using your raw genetic data either in the form of raw allele frequencies or in the form of FST distances and let's represent rows or rivers each time by your binary variable that is for instance for this river you can code the sides that are on this side of that river as one and those that are on the other side as zero or the opposite it doesn't matter and do an ANOVA with the binary coding of the barrier so you simply do an ANOVA by RDA as you have been shown in this course and you can do the same thing for this river and for that road and for that road fine she said let's try that but she told me I don't know how to compute an RDA sit down I'll show you so half an hour later she was acquainted with the RDA function with the coding and so on and I said now okay you do that and call me when you have the results the next day she had the following results okay so using FST the FST genetic distance the FST distance were decomposed into principal coordinates and put into as the as the response data in the RDA we did not have at that time the function to test directly the McCardle Anderson method of testing as if we had done the decomposition so here we did the decomposition and found that this river was had a significant effect on the variation genetic variation depicted by the FST the other river which is the Yamaskar river yes also highway 117 116 highway 112 everything was significant then she also did it with the raw allele frequencies with Ellinger transformation of course and again all the results were significant but here it was even more significant than with FST so now she had something to publish and it led to the paper that you see here in which I was kindly added as a co-op I had contributed a little something half an hour of work showing her how to do an RDA ok so you see that this changed the outcome of that research and the message is that it may also change the outcome of your research that is if you do the analysis of spatial data in the right way using DBMEM function instead of mantel test you may obtain significant results if there is something some signal in your data even though there may be some signal in your data mantel test is unlikely to find it ok so the conclusion of this last part of my last talk is that researchers like the mantel test because it is simple to use one simply has to type something like that the number of permutation is set by default and you obtain a p-value but you don't obtain anything else actually but sometimes this is all people want we have shown here that the mantel test is not appropriate to test for the presence of spatial structure in your survey data for different reasons on the looking at the theory of the mantel test the null hypothesis of the test of correlation between vectors of raw data is different from that of the test involving this similarity and this is the correct null hypothesis to be tested when you want to know about the variation in your original data the statistic used in these two tests are different and cannot be reduced to one another the r-square of the mantel test is definitely not the same as the r-square of regression or canonical analysis now the mantel correlation assumes that in the distance-distance plot the relation among points is linear and the most elastic and that is definitely not the case for specially structured data except when the range of the overture correlation is very large for linearity now if one still applies the mantel test you could say well if I still do it what do I obtain well to specially structured data its power is always lower than that of RDA using MEM it doesn't mean that it will fail in all cases it only means as we have shown by the simulation that the power is very low so we are very likely to miss it but if the signal is very very strong mantel test will still be significant in those few cases but for ordinary cases you are more likely to miss to miss the signal that is to have a non-significant result than a significant one ok that's what I repeat here here R2 cannot be interpreted so analysis by MEM produces output that is more rich than simply a p-value because when you obtain your p-value and Daniela has shown that when he demonstrated analysis of variance by RDA you can then plot your result in the form of one of these tri-plots produced by DRD that will tell you the story that will show the differences between the group with respect to the factors that you have put in the analysis and then you can also produce maps of the fitted values at different spatial scales when you do the analysis by MEM and the mantel test is inappropriate to test the correlation between raw data vector of matrices irrespective of the fact that these are spatial structures are not shown in the last set of simulations that the relationship of the species to the environmental variables was also weakly detected by the mantel test this is what I am reminding you here it is not only for spatial structure data it is also for relationships to environmental variables that it doesn't work and we have another paper with Marijos de Fontaine in 2010 emphasizing that point so conclusion number 4 the main conclusion is that the mantel test should only be used to answer questions that in the application field clearly and solely concern the relationship between distances these questions are found in ecology and genetics but we don't want to say that there are no such questions we have actually one example from the literature where the mantel test was appropriate with ecological data that's all we have found up to now but maybe some of you will find other questions but beware the question that you want to answer with distances should not be a question derived from a question that originally concerns the raw data it's easy to say as I have heard the speakers say if there is a relationship between the raw data then it will be found in the distances simulations show that this is not the case it may not be found in the so it may still be useful if the question only concerns the distances and so finally who wants to use a test that has low power if you want to use a test that has low power help yourself okay the funny thing when you assemble the bibliography on my various interventions on the mantel test is that I started investigating properties of the partial mantel test you see here compared to partial correlation in 2000 then we have this paper in 2005 the one with Joseph Optin 2010 and in 2015 with Marie-José and Daniel God knows what I'm going to publish on the mantel test in 2020 okay thank you if you have questions I'll be happy to answer them thank you for the organizers of this workshop for all the work you have put into bringing all of us together I have had a lot of fun interacting with you during this week and we are going to interact again this afternoon in the computer room now time for lunch good