 Okay fine so good morning everybody so benvenuti to the last day already the last day of this course and as you know this morning we will go to the main course of study of spatial structures in ecology that is the methods we developed in the lab since well almost 20 years since we began thinking about all this so something extremely interesting and who proved to be very effective at modeling spatial structures and this at all scales when I spent my postdoc here with Pierre in 1989 we were already thinking heavily about how to model spatial structures and at that time our tool was polynomial regression or well ordination polynomial CCA this is the thing we developed and we implemented within the variation partitioning procedure that we published later in 92 that was actually the end result of that but of course we were aware that it was a very crude way of modeling spatial structures and we wanted something that could go deeper into the structure and this at all possible scales so during the following years we tried just about everything everything reasonable and it didn't work so at last we decided to do unreasonable things and then of course it worked because I guess ordinary reasonable things had only been already been tested by other persons to know way and this led to what we call today distance-based Moran's eigen vector max but this is already going ahead of myself so just to mention that historically the first version of what I'm going to present to you now is was at that time called principal coordinates of neighbor matrices so PCNM and that was the 2002 paper we developed this from 97 to about that date and then we published the second paper with applications in 2004 and that in ecology to popularize the map and yet to you and two years later we had that development or in meanwhile Pierre in invited Stefan Dre and also Pedro Perez Neto in the lab and together they developed or they precise the mathematical framework of all this and Stefan in particular could demonstrate that what we had found empirically was in fact a particular case of a broader family of methods and since then we call this method MEM for Moran's eigen vector so this is the story that Pierre and I are going to tell you this morning I begin with these early stages so and of course going to the outcome that we now use every day so we want to understand and model spatial or temporal community structures because time can be also modeled by this technique through the analysis of species assemblies as we understand now the best response variables available to estimate the impact of changes in ecosystems so the spatial structures here already gave you a talk yesterday to explain why or how they originate and why they are important so I don't have to go here on this topic so now to understand the mechanism that generates those structures we have to integrate those structures explicitly in our in to our statistical models scale is important because some processes act at the global scale others are regional others are very local so there's no reason why all species are structures at the same scales or the same way across all scales and of course we expect that a response variable can be driven by processes acting at different scales so we have to find out which of those processes reflect on the spatial structures of the community at those different scales so we need statistical methods to model spatial or temporal structures at all scales so the main tool the one we use most frequently we call now distance based moran's eigenvector maps and this is the tool that we formally called PCNM principle coordinates of neighbor matrices actually DBMEM as we call them are a slightly modified version of those first-generation PCNM so we have this data we have a species data matrix we have environmental data and we have spatial data at this point let us simply consider their coordinates geographical coordinates x and y so what I'll go we want to model spatial structures of the species data at all scales we want to identify the scales where structures are present and significant in the response data we want to decompose the spatial model in two sub models representing those scales that are important and we want to interpret the sub models meaning to reveal the space species environment relationships at those scales and the scales of those significant sub models and now I go to the unreasonable thing we did we were thinking about so this is a general pattern in the upper left corner you have the response variables in this case I have put one single variable but of course we apply these to our data tables through RDA most of the time so here you have data that are geo-reference meaning you know exactly where the site where the call the sample units have been taken compute a Euclidean distance so these are the data compute a Euclidean distance among sites among the geographical locations of your sites so in this simple univariate case where we have one transect this would look if it's equispaced it would look like something like this the matrix of Euclidean distance so this is the upper triangle triangle matrix here well you have a distance of one between the site that are closest to one another of two of the second closest and so on and now the idea was to say well what we first want is to identify the spatial structure at the first level meaning closest so for the closest neighbor the nearest neighbor hence the idea to truncate to truncate this Euclidean distance matrix to retain only the first neighbor relationships and this after that of course you have to fill up your matrix with something and after months of empirical trials we found out that if we filled this matrix with values equal to four times the maximum value that was retained meaning in this case it would be four times one equal four but in other cases with irregular sampling or two-dimensional sampling and anything you want you simply truncate the matrix to limit and then you fill up your distance matrix your matrix of your geographical distances with four times that maximum value before I go ahead be reassured you won't have to do this by hand of course we have now functions that do that for you automatically and believe me when I did the simulations for that each of those trials before I put up the Fortran code at that time it was Fortran code to run the real simulations thousands of things and so on I had to go through I don't exactly him and remember how many steps in playing how many computer programs to be able just to do what I'm showing you now to you now I think it was something like three to five different programs that I have I had to go back and forth and do it now we can do this even if you don't have the functions the ready-made functions to do it you could do it in a couple of lines of our code but at that time we didn't have our okay so now we have a matrix that has been completely deformed I mean just the first neighbor relationships have been retained up to a certain distance and everything else had been filled up with four times that maximum distance this is a new matrix that is not at all Euclidean and for of course now since we had the information in the form we wanted it we could not resort to any square rooting or whatever anything even square rooting that doesn't make it Euclidean and no other possibility was available there so after that we still have another step to go through to be able to use the result as the necessary spatial information needed for canonical analysis and that's that is principal coordinate analysis and when you have any similarity matrix and you want to have its components then you run it through principal coordinate analysis and this is what we did obtaining obtaining here as expected positive eigenvalues and one zero and a couple of negative the first time we tried this we were hoping to get some information about one specific scale of relationships among the sides of spatial relationships meaning the one relating the closest neighbors to one another but to our surprise and pleasure we discovered that actually that way of treating our matrix produced the whole range of eigenvectors of different it was a we tried first with such kind of linear something to see what was going on it's easy to figure out to visualize and we had the surprise to find that is provided the whole range of sine waves going from the from the broadest one to the finest one and then we said yes this time we have our spatial variable these are much more detailed and present at all scales then and also also going on to one another all characteristics that our good old polynomials did not and then of course we applied multiple regression or RDA CCA as well at that time so what I've just explained now are the PCNM so principle coordinates of neighbor matrices now you see why it is called that way you have a matrix of neighboring relationships so the distance among sides up to a certain limit and then everything else replaced by an arbitrary really large value why did we replace it by four times the largest value because it levels out at that from if you only one time simply repeated the largest it gave some results and twice the largest it gave other results and three times four times yet slightly different results but from that point on five or ten times gave the same results essentially so since it leveled out we stopped at this point by the way couple of years later Stefan also explained mathematically how we behave like in any case this is not important here but what we obtained here was extremely effective at extracting spatial structures from data but it also presents some little problems or maybe simply there are some features there that have to be mastered before being very effective one of those features being the large number of variables that are produced here and these large numbers are one reason why we almost systematically resort to forward selection after the building of the DBMEM to find out which of those are significantly related to our species we didn't we would have an overly large number of explanatory variable and model overfitting is not our cup of tea as you know so what is the difference to be quick about this because I won't go very deep into the the maths here it's it's not necessary what is the difference between the original PCNM and what we now call DBMEM it's actually quite technically it concerns the value of the on the diagonal I didn't mention the diagonal here because it looks trivial I mean to from our point of view the diagonal of the matrix is the distance between each site and itself which should be zero okay now during its years in in Pierre's lab steph Andre showed that you could consider it another way well he turned those two similarities instead of distances and he replaced the diagonal which was now similarity one to have a measure with an upper and a lower bound instead of similarity one he put similarity similarity zero as if each site had no relationship to itself seems strange and also he had a slight modification in the way I didn't he didn't he didn't he doesn't make the power something I think or not in the usual way as I remember correctly does nothing I think all the way is different but the result is about the same or something like that but there is a difference okay so so because oh didn't look up the math the detail anyway as the main thing here is that using those slide tricks he managed to obtain very slightly different things well actually they are completely the same but the way they are scaled across the the whole range he can obtain instead well we had positive we had of course the total number of positive null and negative eigen well eigenvector with positive null and negative eigen values was n minus one of course this is not manageable you you do know you cannot run a regression with n minus one explanatory variables of you get an r square of one as we explained already but then of course the one with positive eigenvalues is less than that for a transect for the PCNM the original PCNM we got for a transect of 100 points we got 67 positive eigenvectors or eigenvectors with positive eigenvalues but we found out that some of them later that some of them actually corresponded to the modeling of spatial structures with negative autocorrelation negative spatial correlation the way step and re modified the computation has one great advantage it makes the eigenvalues of all those eigenvectors of all those PCNM now DB and MEM exactly proportional to their respective moran's eye so the index of spatial correlation meaning that just from those eigenvalues you see whether the model positive or negative autocorrelation so actually that corresponded to a decrease of the number of positive eigenvectors or real eigenvectors because of that property furthermore I should have said only eigenvectors with positive eigenvalues and so on furthermore all this is actually translated to a situation where all eigenvectors are real so we have real eigenvectors modeling positive spatial correlation and we have real eigenvectors for modeling negative eigenvectors that was something very handy and it's of course easy in that case to separate the two bunches of eigenvectors and now in the case of a regular equispaced transect of 100 points you would get about 49 or 50 eigenvectors modeling positive spatial correlation so to summarize their properties DB MEM base functions so eigenvectors represent a spectral decomposition of the spatial relationship among the study sites they can be computed for regular or irregular irregularly space sets of point the DB MEM base functions are orthogonal or all orthogonal to one another you know that principle coordinate analysis produces orthogonal axis so we don't have that problem that we had with polynomials for instance and if the sampling design is irregular the DB MEM have also irregular shapes not the the beautiful sine wave all the patterns I show you in a couple of seconds but they can still roughly be sorted into scales from the broad broadest to the finest scale since they are orthogonal to one another and representing various scales they can be split into subgroups that can be put together to represent sub models of spatial scales of various spatial scales you could run a forward selection of your species and against all say the spatial positive to be short now I speak of positive or negative DB MEMs but this is short for DB MEMs modeling positive or modeling negative spatial correlation okay so you could take the positive ones which is what you do what we usually do run a forward selection because you have a roughly and divided by two for regular sampling designs you have an over two of those so this is way too much too many so you run a forward selection and usually what appears after the forward selection what is retained is a couple of broad scale ones maybe several sometimes they are scattered all the cross the spectrum but in many situations you may have a couple of really broad scale ones that come out and then another group at intermediate scales and a couple of them scattered in the final scale this is the usual pattern so it in this case it's relatively easy to fish out those that have been significantly extracted at the highest level so meaning the the broadest scale and separate them there is an arbitrary decision of course to decide where you split those but you can do this and then take the same at the intermediate level scale spatial scale and at the final scale they can also be computed for circular sampling design there is an example in the paper that can be found on the page to do that how to code the sampling design itself before putting it into the grinding machine of the DBM production DBM analysis can also be used to for temporal analysis since you can apply this to a spatial transect you can apply it for time as well and I shall present you later to you later this morning one way of applying it or one application where this property is useful to us so this is how they look like for those DBM em for a regular list based transect of 100 points I have just plotted a couple of them here so the few you have the numbers so this is the first this is the second four eight fifteen twenty thirty and forty you see actually here for those 100 points we get 49 orthogonal DBM em positive the DBM em so you see that really this captures the spatial structures only has the potential to capture structures from the broadest scale to the finest one I might yes did you understand how polynomial regression work it's the same principle except that instead of polynomials so again one band and two bands and three bands we have those so it's exactly the same way you mean you take them as explanatory variable and if for instance you have a structure in your data that has something very important or a trough of course it can be reversed through the sign of the canonical coefficient something that contrasts two regions in your transect then it will be captured by this first one and this will come out significant if you have two bombs or central trough or central bump or whatever it's very likely to be captured by this one and so on and so forth so since you have really everything of course I have only put a couple of them here you have really all scales here finding a combination of those two model just about everything thing can be done and we did it I mean if you consult these papers all of them the ones I have cited of course are available on Pierre Jean's website so we can download them any time so we really tested this on on a variety of shapes and those DBMEMs are able to capture just about everything to model what you want but then I was just saying at that before your question rolls it actually you can model even a leader linear trend with those it seems a little bit strange that you could model a linear trend using sine waves but this is extremely cumbersome you need half of them it takes actually one out of two every second DBMEM has to be mobilized to model a single linear trend so this is an obvious reason why you won't do that what you have to do is first check if your response data had a linear trend or not and this can be done of course by simply running regression or well an RDA of your response data explained by the X and Y coordinates if this came comes out significant you remove the trend meaning you take your response variables you run an LM no no need to go through the RDA function LM function goes very well for that you take you you run your LM model as your response variables tilde X plus Y and you take the residual so residual LM and the model it's that simple so you are you get rid of this linear trend and of course you can use these full potential going to trying to find out which spatial structures are present in your data for a regular square two dimensional central sampling design those MEMs or DBMEMs give this so DBMEMs I explained you why these are called now DBMEMs because we are distance based meaning the truncation distance of the geographic distance matrix is based on distance and not simply on links or whatever distances and they are called Moran's eigenvector maps because they are eigenvalues are proportional to Moran's I to the Moran's I okay so this is what it is so again here you have the possibility to model a trend combing combining those two can can of course allow you in this case to model a vertical or horizontal trend here and again the two dimensional equivalent of those sine waves I showed you in a previous slide here so in this case I have 400 points on each of those maps and I get a little less than 200 orthogonal DBMEM modeling space positive spatial correlation in general cases we restrict the analysis to the positive DBMEM because it's rare that the negative ones allow us to find something interpretable meaning that of course if you have positive spatial correlation at short range and you have a gradient you're likely to have negative spatial correlation at broad range which would be the case here but this is already captured by such such a DBMEM I mean if you have this situation and indeed your community is contrasted between that part and this part then the first DBMEM will be highly significant and you know that you have positive correlation at this range and negative at the broad one because if you are here you're likely to find a very different site when you are far away so actually we don't use except for very restricted situation where people find what want to find particular things usually we restrict ourselves to the positive ones and we forward selling these ones so let's go to a real example now again with my all about it might from the like as you remember these are 70 irregular irregularly positioned sites and 35 species so how does it go at this point I don't speak of environmental variables as you as you notice this is a pretty spatial one okay this is the next generation after that transverse analysis based on polynomial so I construct the DBM variables and in this case I got 22 with positive spatial correlation well you'll see 22 with the 70 side this is not the half of them it's less than a third yes because this sampling site is the sampling design is irregular you get this so we get more actually more negative than positive ones but this is not it's not important because those those 22 are plenty enough to capture many spatial structures and we run the global analysis and then we forward selected the variables and we got eight significant DBM EM to be a semicolon actually between global and eight significant DBM we get we got eight significant DBM EM variables and we could well as I told you this is an arbitrary decision where you separate them but you partition those by scales so the DBM EM one three and four which was significant the second one was not represented what we call the large the broad scales DBM EM six seven ten and eleven the intermediate one the medium scale and then we had that DBM EM number 20 just alone in its corner modeling fine scale structure that came out significant and this one was considered as a representative of fine scale structures and we separate the DBM EM analysis by scale so we run after that we can run separate RDAs using these three groups all the code for this analysis for you to replicate it is in the practicals in today's practical so you will be able to do this by yourself so this first figure here actually represents the DBM EM themselves so this is not the analysis of the already but it might it simply the special variable equivalent to those carpet like structures I show you two slides earlier okay so this is what I obtained for the already but it might so this is strictly these are our tools okay our explanatory variables so as you see the first one contrasts two groups of sites here up here the black ones and here still in the upper half of the transect the large white ones and the rest of them down there have very values very close to zero yes we interpret this it simply means that you have a large positive values in black and well positive values in black negatives in white all the reverse I don't remember this is an important and as a Swiss I tend to see the black values as the down and the negative and the white values as positive because I usually see the snow in the cup of the mountains and not down in the valley you see you get my point okay so this is an important as you know the signs are a bit true so okay though so we have one that clearly contrasts two regions here the number three seems to contrast something here laterally between those two region again nothing very important in this lower part of the of the transect but here you you get something to the significant DBM EM modeling a contrast here in in this region but after that you see that why we decided that those three modeled broad-scale spatial structure by contrast of the next to the next one here you see that you have already finer structures potentially being identified that at scales a little bit finer than the previous one so we decided that those three together could be our tools for modeling the broad scale structure here so 6 7 10 11 you see you have progressively finer DBM EM so with the potential to bring out some structures at intermediate scales and DBM EM 20 which is one of the last ones as you see you are at a really local scale contrasting pairs or triplets of points in regions like this for instance also so this is really something that could bring out some particular feature at that scale if it's if it is possible and now if I for before going into the partitioning into into the separate scales which I shall leave to you to do in the also in the in the practicals here is a DBM EM analysis we're using the whole bunch of eight the whole group of eight significant DBM EM so this is well multi-scale yes but all all the features in the same analysis and I got three significant RDA axis so the first one here contrasting those two regions with respect to the other ones these are the driest and there is something happening here also concerning water or the lack of it while the rest of it is more wet weather and well as you see here the second one contrasts the central zone with respect to both here these are flatter and we have more hammocks here and and and shrubs here and here so this may be related to that and that's again here a third axis with alternating patches here now of course if you want to interpret this in terms of environmental variables you just have to take the fitted side scores of of these axes separately the first axis for instance and regress it on the environmental variables to see which ones explain these structures seen on the first axis and you do the same with the second and etc until well for all those significant axes so in this case especially where all in in those cases it's very interesting to run those RDAs and test the axis separately to see how far you can go and hence how many axes you can interpret after that in terms of environmental variables yes what I was saying is that the tools the DBMEMs themselves meaning the spatial variables which have been built by principle coordinates of neighbor matrices those are number one three four etc six seven ten but now what I have done is that I have taken these eight significant DBMEMs and I have these have been constructed those tools have been those DBMEMs have been constructed without the intervention of the aribatid mice but then I I forward selected them and I obtain those eight or one three four and so on okay so these are not my not the axes the ordination axes of the mice but the tools I shall use as explanatory variable so now I have run the RDA using those as my X matrix with the aribatid mites as my Y matrix and I got those three so the first three axes RDA axes are significant yes Pierre you have a oh this is a mistake this is this should this should read RDA one one two and three you are fully right I shall correct this and put a new a new version yeah of course sorry yes so this is my fault definitely no this is the RDA axes one two and three okay we as I had detrended the mite data beforehand because there was indeed a linear trend present so I did as you as I recommended you to do this is what I obtain and now I have nothing to correct here because nothing you've written but I made just as well I did so this is RDA one and two of another RDA run this time only with the three db men's modeling broad-scale spatial structure okay so we are coarser here we just focus on those broad-scale spatial structure and I get those two models here that again I can regress so the fitted side scores regress on the environmental variables to see what they explain there are details about this in the in the book as well of course although I think in the book now still done with the first generation PCM but it doesn't make a lot of difference and this is again another so this time we are really we have separated the scale so this is broad spatial scale this is intermediate spatial scale meaning I have run the RDA using only the four db men's of intermediate scale so 6 to 11 those that are significant in this group here to use and again here I have obtained two significant RDA axis so this is the result and again I could regress this on the environmental variable to explain those structures to see what are they the force of the environmental forcing that determines it and maybe I find them and maybe I want and this would be the occasion to think about what could have generated a part of these structures if my environmental variables did not maybe I have missed a couple of them which is very possible in this case or maybe other processes may have generated some of the structures and actually I don't show you the result of the finest scale analysis because it appeared when you isolated that 20th variable from the rest of them you get a non-significant result this happens sometimes so actually you have some kind of a an effect of the the fact that you have common commonly used a couple of variables some are facilitated by the presence of the others and obviously this is what happened with that one okay so see practicals for the interpretation because everything the material is there you just have to grow through it to my my script my today's script of practicals to see that now they still may seem a little bit complicated for people who four days ago had never heard of anything like the stuff we are explaining to you fortunately I have a heart and I think I I have a consideration for users and so the idea came in my mind to program a thing that a function that first what first called quick PCNM and that I have now reprogrammed a couple of weeks ago to become a quick deal a quick MEM I didn't put the DB because that it's beginning to make too many letters so quick MEM so with at the price of one single command our command kind of result quick E M let's say my dot H just remind you that it has been held in your transform and my dot XY for the XY coordinates you do this and you wait and you don't wait for a long time at that because my function of this you have it so for people who want to see how it is programmed it's very transparent you see all the steps first it takes the data and it tested there is a significant trend if it finds one it returns the data and it goes on after that it computes the DB MEM Eigen functions and retain those with positive more inside so positive spatial correlation it first runs and test an RDA of the species wheat with all the DB MEM and it tested if it's not significant okay it stopped there it means that you have no special significant spatial structure in your data so we need to continue it stopped there with a with a message if it is significant it continues it runs a forward selection of the DB MEM with the Blanchet et al at co-authors double stopping criterion so it takes the adjusted R square of the global of the global RDA that has been obtained here it has memorized the R square the adjusted R square and of course the double stopping criteria is the second or the usual one is the alpha level rejection so it runs it automatically when it has selected some variables it runs a new RDA with the DB MEM with a significant DB MEM and it tests the RDA with those ones and specifically it tests the axis to see how many of those axes are significant and point seven it draws the maps like the ones I showed you it draws them automatically and present them to you and the output object here this one contains all the results so you have the detailed results of the RDAs you have the details of the forward selection and everything and the whole thing again the only thing to do for you is this isn't that cute so it's just there are just two things it doesn't do the first one is splitting the significant DB MEM into sub models because this implies an arbitrary decision that you have to make so you have to go to the detailed results if you want to do this and you see that these and these well it gives already a couple of results on screen I mean but on top of that all the results are in the object but for the rest of it you have if you want to split your significant DB MEM into sub models you have to do it by hand and then rerun the RDAs because of course the DB MEMs themselves are here actually there are even two objects containing them one containing the complete set of DB MEMs in case you would use it use them in another situation for the same set of sites and another subset of them the one the subset of the DB MEM that are significant I have already put in another in another object so you don't even have to go through the process of sorting them out a second time it's already done for you and the second thing it doesn't do is get a coffee for you so fortunately I didn't find the command okay I will work with this the this PowerPoint presentation for which you have the PDF and I will start with slide the number 35 because the the first 34 slides are a presentation of what Danielle just presented in some other way with some other examples so if you want to see other examples there are at least three example data sets that are treated in my first 34 slides but I will start the story at the at this point where Stéphane Dray was in our lab yes that's the picture of Stéphane Dray as you know he is now the researcher in charge of AD4 ADG net AD graphics AD spatial in Leon but then when he was in our lab as Danielle as described I asked him to look into the the method that we then called PCNM analysis and tried to put it in a formal mathematical framework and then he played with the former PCNMs in all sort of ways and one day he called us and drew something on the blackboard of the lab and he said when you take the geographic distance metrics and you truncate it actually you are dealing with two types of information so this idea that he had was central to what I'm going to show you he said first you have the connections between the sides the connections that you want to keep and those that you discard okay so you can see this matrix this is a side-by-side matrix and it has a diagonal of course and you can write it with ones for side that are connected for instance when you keep the first distance class you keep a one between the sides that are at the first distance class and you put zeros for sides that you want to disconnect those that are at some distance larger than the distance you have chosen for truncation so here you have zeros and one giving you this connection scheme then the other information that we are using in distance-based MEM is some weighting and in that in in the case that Daniel presented the weighting is the distances themselves so he said we have these two types of information and maybe we could play the game differently in some cases we could use only the connections not the distance and in other cases we can multiply this metric by that one with the distances or we could replace the distances by some other weighting of the relationships between sites it doesn't have to be geographic distance and then we sat back and said wow he has opened the gate he has opened the method to all sorts of other inputs including everything that landscape ecologists and landscape geneticists are doing that is to use the difficulties of communication between sites for different types of organism in here instead of distance so this was this great contribution that he explained in five minutes and but it is key development yeah the next thing and this is what Daniel explained in the old PCNM if you consider distances then we had the zeros on the on the diagonal to indicate that site a site was connected to itself and Stefan found that in the calculation of correlograms that Daniel explained yesterday with the Moran's eye special auto correlation coefficient you consider that the site is not connected to itself so he said let's do the same thing here let us put four times of truncation distance on the diagonal instead of zeros so but this is what Daniel has already explained it was the other great development so in terms of the different possibilities that are now open to us thanks to this division in two matrices and by the way this sign here is the Adam our product it means the cell by cell product of that to that if you have distances in this matrix and here you have zeros and one when you have a one here you keep the distance when you have a zero between two sites that the distance becomes zero it is it evaporates it is truncated okay so that's the effect of a had a more product so he said in binary MEM we will only use this and this is the method that had already been described by specialist statistical geography Daniel Griffith in the geographic literature published at the same time as when we published the MEM since he published it in 2001 at that time our paper was already in press so we did not copy on Griffith it was just simultaneous finding but it was only interested in using binary connections while we were interested in having the distances intervene in the calculation so to obtain the DB MEMs that Daniel described matrix a contains distances in addition to the collection now we can also replace matrix a there by some function of the distances and in the 2006 paper that has defined the republished he showed ways of transforming the distances and investigating a series of transformations there are exponents of the distances that can be investigated in an automatic way thanks to some functions that he made available and so you can try all sorts of exponents of the distances and find out which one gives us the MEMs that produce the highest higher square when you do an RDA against the ecological data okay so that's the trick to say that the best transformation of the distances before multiplication with this and production of the MEMs the best one is the one that produces the best model of the data that is the model with the best adjusted R square so he has functions that do that automatically and then he said we can also replace matrix a by other weights other types of weights for instance resistance of the landscape as usually done by landscape ecologists so yeah oh yes yes yes because we use only half of it we if it was not symmetric then the principle coordinates of obtained from a non-symmetric matrix would not be orthogonal I'm not saying it is not possible to compute them it is possible to compute them but you will lose the orthogonality there are other ways of handling non-symmetric matrices one of them is to separate in the non-symmetric matrix into two new matrices one is the the asymmetric matrix made symmetric by taking the average of the two sides so this would be the symmetric component and the another component is the anti-symmetric component which is a new symmetric matrix made with the difference between the two sides the two sides of the asymmetric matrix and we have investigated that a little bit and for ordination for instance and these two components you can try them separately to do two separate MEM models that would be the way I would recommend to handle non-symmetric matrices distance and indeed resistance made to the in the landscape can be asymmetric when you have a physical process for instance currents in the sea but then for these a similar for these physical processes that create asymmetry there is another method that is I believe more efficient than MEM and I'm going to talk about that right now okay so yes now we have all these four solutions instead of only DB MEMs and in the new AD special package there is an MEM function separate from the DB MEM function that allows you to do these other combinations to the price of greater a bit more complexity you don't have a quick MEM function well the quick MEM that Daniel produced is for DB MEM only okay now here came Guillaume Blanchet and this is the asymmetric method that I was just talking about during his stay in our lab he did the masters in our lab and during that masters he developed the double criterion for forward selection and he developed the AEM we told them we can give you a PhD for that he said no no no I want to go to Alberta and do my PhD there with fine young head so master is fine but he did more than most PhDs in the masters okay so a work I work with him on this problem of course of developing asymmetric eigenvector maps that is more than eigenvectors designed to model processes to model the effect of physical processes that create an asymmetry in the in the data for instance the difficulty of going from point A to point B going downstream is not the same as going from point B to point A upstream can easily realize that and this applies to all kinds of processes we our applications up to now have been in water currents and things like that but it can also be applied to transport in the air for people who are doing atmospheric science it can be used to model transport of pollen in the air or of any other any other thing of interest to you so again with Guillaume we spent two years trying everything in the book and then we realized that the we already knew the solution from the study of the evolution of genes actually we use the method of coding for a gene tree that we were already using for to model at the river network we had used that in a paper 1996 or something like that and we realized that we had everything almost everything that we needed to produce more another eigenvectors for asymmetric processes so I will explain this example and you will see that it generalizes to all sorts of and any kind of asymmetric process here imagine that you have a river network and that you study what it does the network explain the difference in the fish composition that you find in lakes that are in these four boxes that one five six boxes these are rectangle lakes for convenience of the schematic representation and this is the river network and we were inspired for this example by the fact that we live in a recently glaciated area ten thousand years ago in Montreal there were two kilometers of ice on top of us so you know the rivers that we have now are recently appeared after the ice melted and of course ten thousand years ago if there was any fish there it was frozen fish like in the supermarkets nowadays but there was no swimming fish and the fish that we find in our lakes have recently gone there and they came back from glacial refugia which were farther south they were the fish were in the Mississippi Refugium or the Hudson River Refugium and they came back as the ice melted so we had this idea in mind of fish going up rivers rather recently to recolonize the lakes that were newly reformed okay so how can we model that we said we have two types of information about the network instead of geographic coordinates as in the DBMM approach here the information is about nodes and edges it is a graph from graph theory where you have points that are the nodes and links between the points that are the edges and they are directional links because if fish go up the river they go from there to there we don't care if they go back down we are interested in those that go up and they can only go up following the the rivers that separate the lakes either the the rivers that presently exist or in one study we used the connections between lakes that existed in the past during the story of post glacial formation of the lakes and geomorphologists can tell us how there has been a connection here that's not there anymore so we can add it to our graph anyway the fish must have gone through these rivers either existing now or historical that existed in history so we said how are we going to code that we use the system of coding that has been used in biogenetic analysis for 35 or 40 years to to represent how a gene involved into alleles that's what we used and it can it is done by this matrix here where the rows are the points the nodes and the columns are the the arrows and we are interested as in phylogenetic in the in directional links that is in arrows and we will see it will say the that each node here either the lakes that are nodes and boxes or these nodes here can be modeled by the series of arrows that connect them to the origin okay so for instance lake number and eight here the lake this lake the fifth lake is connected to the origin by edge number two number five number eight so for a lake number eight we have a one in edge as the bottom of edge number two number five number eight in the same way node number one is only connected by edge number one to the origin so there is only one there and so on so we we had already used that in this paper in 1990 something in an RDA at that time the the important step that we had not thought about was to do a principal component analysis of that or a principal coordinate analysis of the distance matrix obtained from that but now we knew from the DB MEM construction that we have to go through a distance matrix get the eigenvectors and these are representing the various scales we said let's try that with this matrix but it is even more simple because we can simply do a principal component analysis of that so PCA and another method of calculation to obtain the PCA and you may remember that PCA is first calculation of a matrix of covariance then eigen eigenvalues eigenvectors but you can obtain the exact same result by doing another type of the composition that I did not describe in this course that is called singular value decomposition so this is what we currently do in our program when we produce AM we do SVD of that matrix and it produces all the eigenvectors which we need immediately and there is a diagonal matrix containing the squares of the eigenvalues okay so it is very easy to compute it or we could compute the distance matrix here the Euclidean distance of that and do principal coordinate analysis and we would again obtain the same thing so three different methods of calculation to obtain the asymmetric eigenvector maps and I will now show you that these things these eigenvectors they they represent the different scale and of course after we have constructed them we will use them to analyze the response data in exactly the same way as Danielle described that is we may take all those that model positive special correlation and use them together in an RDA model we can compute the adjusted R square of the full model and then do forward selection using this the first adjusted R square as ceiling in the selection of the AM we can then obtain the R square yeah this adjusted R square for sub models after dividing the retained AEMs into some model we can produce maps and so on we do exactly all the things that we were described before but this is well designed for to study the results of asymmetric physical processes now I'll show you the AEMs obtained for this that work here this is my exemplary network here they are so I drew the AEMs as shades of gray on the network of the river system okay so this gray here would be zero value let's say that while following the Swiss mountain paradigm this would be high positive value the top of a mountain white and this would be deep in the valley black okay negative value so we see that AM number one divides the network into a left node with positive values and a right and node with negative values right so you see the asymmetry immediately in the process originating from node zero that has the value zero AM number two will work only in this branch in the left hand branch dividing again between this branch with positive value and this branch with negative value everything here is near zero AM number three will do the opposite it forgets about the left hand branch and divides this one into positive and negative actually the values may very well when they are produced the axis may have reversed sign as we discussed before it happens by chance well by the chance of the algorithm and the way the algorithm is interpreted by your specific computer I may obtain with the same our function I may obtain let's say and positive values there and negative values there on the Mac and on the PC you would have been the opposite with the same code it doesn't matter okay we have seen this one in this one what do we do the differentiation is along that branch here instead of having let's see slightly negative to zero to positive we have from strongly negative to to positive here in this left-hand branch here it is slightly different and here it is in the right-hand branch so we have finer and finer processes that are modeled by the successive AM and with this this number of objects we have eight objects plus the origin we we obtain in total seven AM and the first one have correspond to positive more and I and the last ones correspond to negative more and I meaning negative spatial correlation that translates in the fact that you have black circles that are next to white circles this is negative correlation opposite things are neighbors in this model while here you have a smooth gradient along the two branches okay so it seems to work I will show you only one example I think well I may show you another example but this is the examples that we put into the original paper describing the AM method we're talking about 42 likes in the reserve in Quebec north of the Saint Lawrence River and what was the interest here was the diet composition of the of the brook trout salvellino suantinalis in these likes so we looked at the stomach content of the brook trout I remember explaining this to an assembly of statisticians and they would not understand why we would look at the diet and how we would do that so I told them we catch trout and then we asked the trout what did you eat today and the trout refused to answer so we had to open the trout take the stomach and look at what is inside in the statisticians that wash but this is what biologists do okay so you will see the reason in a moment we wanted there there was actually a hypothesis that the crowds that successively he reinvaded this river network after the glaciation came in successive waves and they may have belong to different genetic groups and that different groups went in different parts of the network and we wanted to test that and the opposite idea was that fishermen may have carried the small trout that they caught in one lake and use as bait and then they went to another lake and threw them in the lake at the end so transferred the genetic stocks from lake to lake and obliterated that that signal so we will see how it comes out this is a schematic of the said river network St. Lawrence River is something somewhere at the bottom there and the the trouts that we find now in the lakes that are the dots they all came through this route and at each intersection some group of trout may have decided to go left and another group may have decided to go right depending on water levels the smell that came with the water coming from there or from there that attracted them and so on so trouts have their own ways of deciding if they go left or right when they go up in a river like that same thing in the sea okay so the all the black dots are the lakes from which trout had been caught and we coded this network we coded it by hand with the rows being the the lakes and the columns being the edges like this we had the choice oh yes I did not mention that here you can add weights if you want to these zeros and ones if you think that the distance is important and relevant you can multiply each of these of these columns by the distance that it represents but in the case of these fish we said well if the segment is half a kilometer long or two kilometers long for a trout it doesn't matter even I can swim a kilometer so imagine a trout so it is so in this example we didn't we coded the edges only with zeros and one because we thought that distances were irrelevant what was important was that at each intersection the trout has to make a decision going left or right every time it met an intersection and so some of them ended up in the group of lakes while others ended up in other groups of lakes so we studied this in different ways if you're interested you can look at the paper for more detail here I'm giving you a brief summary because we'll have coffee pretty soon so question was is the diet variation related to the genetics of the population of trout that successively invaded the river that worked after the last investigation the the metrics that has the nodes the lakes has rows and the edges the arrows in columns is called metrics E in our software and in the paper and it is constructed with 42 node and 65 edges we didn't use weights corresponding to distances everything is in that paper table to the results that I'm going to show you in summary here because we tried all sorts of ways of doing the calculation we compared in particular the AEM method that I just described to distance based MEM distance based corresponding to the distance between the lakes in the network and for instance this lake is very close to that one but along the network it is very far because this lake is in this branch while that lake is in that other branch so this is a good situation because of the folding of the river network to where we should have a clear signature of whether it is the network that is important or the geographic distance whereby fishermen may have carried trout I mean going to fish in this lake in the morning taking small trout carrying them there in the afternoon after they have their picnic this is short distance so this is why we compared the two models AEM against distance base MEM as the crow fly I mean distance direct geographic distance between the lakes here the AEM model had an adjusted our square of point 64 pretty good model while the MEM had an adjusted our square of point 20 who wins so we concluded that the trout genetic variation among lake is better explained by the AEM model then by the direct distance model but of course since this was significant and even when we did variation partitioning between the two groups of more and eigenvector we saw that even the fraction C corresponding to this fraction was significant in a partial RDA and we saw that a small portion of the variation was non-directional was significantly explained by this way it is an unfair comparison yeah no we did not do that one no but we were comparing two different hypotheses of of the processes by which these fish happen to be there yes well it's more complicated these legs are pretty similar but some of these crowds are pelagic and others are bentic and others are intermediate this is what changes the diet they either plankton or invertebrates in the bottom or a mixture and yes I did not go in detail through the various elements of diet but suppose you all look at that these are different things that crowds can eat but depending on where they feed they eat different things so that was basically the idea okay so you can look at this at this paper look at the example and criticize it as much as you want it is only any an example of the way the the method is used and that was our first published example in our ecology I have other examples here there was a second paper published with three examples coming from the work of people who had these data and who became co-authors of that paper one of them is this the distribution of this crustacean in a river in Guadeloupe I told you yesterday that Guadeloupe is actually two islands this is where the volcano is the Soufrière and it there are steep rivers along the flanks of the volcano and here our colleague Dominique Motif from the Université des Antilles de la Guyane collected these data in the small segment of the river using electric fishing and you can look at this example blah blah blah and so on how the network was constructed and how it was analyzed and how the results compare between well it was then PCNM here that we used PCNM was for geographic distances among the points including the distance MEM here was for simply for connections presence are absent but not the geography as here and this is the AEM model and here we have the variation partitioning showing how these various explanations are redundant or not this one and that one that is this and that are very redundant but the AEM explains a portion that the MEM models do not explain in that case there are two other examples one for bacterial production in the flusial lake and the third one is for the distribution of different larval form of zooplankton species Calenus filmarchicus along the coast of Newfoundland using a hydrodynamic model that had been built by physical oceanographers as this for the structure of the network so these three examples may be of interest to you and all that was published in this second paper where's it no not that one anyway a second paper published I think in 2010 yeah this slide simply shows that for a time series that represents a directional process of course then we can use the AEM to model the time series and but the structure of the AEM of the E matrix is very simple because from the origin here time one is represented only by the first edge here by E0 time two is represented by this and that time three is represented by this that and that and so on as you go on along the time series you have more and more edges that intervene and this particular that E matrix produces for an AEM modeling positive temporal correlation and five modeling negative temporal correlation nine altogether for ten points so this gives you the idea that we can apply that to model directional processes and in AEM analysis we don't detrend because if we detrend we lose everything so we don't want again to throw away the baby with the water of the back so we don't detrend because the trend is what we want to model with these directional processes and there will be more there is more of that in this document in this paper where I described this paper yesterday where we use which is epic bay bentic monitoring program data and you will be able to play with that this afternoon and here we use MEM and AEM to model the temporal structure in addition to the spatial space time analysis that we also do so this afternoon I will let you play with all that and but there is a word of warning that comes that comes here this is about the software so in the new ADE spatial package we have this DB MEM function the forward cell function a scatogram and so on the MEM function for generalized MEM and so on but then function DB MEM here replaces the function PCNM that we had before I took the function PCNM and adapted it to the code to the code already in existence in ADE spatial to produce this function and to make it to make sure that it computed the MEM the DB MEMs correctly so in the practicals of the Chesapeake Bay the calculation were still done with the PCNM function that paper was published in 2014 the work was done 2013 this DB MEM function I wrote it in last May so of course it did not exist at the time when you see the script of the practicals when it calls the PCNM and capital function please remember that this function does no is no longer available and it is replaced by that one the way to pass the parameters are I think they are the same but you will have to make sure that you do it correctly with this new function okay I'll stop there and we can have coffee and start again at 11 okay