 Now, Pierre Lejeune has presented you the grandmother of ordination methods, PCA. Before that golden times of pre-transformation, there was that concern about species. All ecologists wanted to compute ordinations because they already had realized the importance and the potential of those methods. At the beginning, of course, they were not aware of the double zero problem. And so they used PCA on raw species data and they ended up with sometimes quite absurd results and some of them decided that, well, PCA and ordination altogether is not for species so we'll resort to other techniques and they abandoned it altogether. On the other hand, in the 60s, some researchers became aware that other ordination method may be interesting. And among those, what is called correspondence analysis. Correspondence analysis has first been designed to analyze tables, contingency tables. So actually, they are quite well devised to find the exceptional values in a table. They found that because it is actually one of the features of correspondence analysis. So the reason that actually a table of species or site by species data which has absolute frequencies in it like in any contingency table could be considered of some kind of very special contingency table where one variable would be, be careful, it's not the way we interpret it now. But basically they consider, wow, you could have a variable site with as many levels as you have sites and another which would be species with as many levels as you have species. So you have your table and you could analyze this by correspondence analysis. In such a computation, you have, as you may probably know, when you compute chi-square statistics on the contingency table you have those chi-square values in the cells of your contingency table. So actually, CA correspondence analysis is a PCA, but instead of computing the PCA on the raw values you first transform your table on a table of the contribution of all the cells to the chi-square and then you submit that table to a PCA. So this also means by the way that all the new values in the cells are weighted by the sites and by the species. So you have actually a method that already embeds a transformation of species in such a way that the new distance that is respected among sites or among species, we'll see the details in a moment, is actually the chi-square distance and that chi-square distance does not consider the double zeroes as a resemblance. Zeroes don't count in the computation of a chi-square value. As a consequence, CA is a method that at the outset without any pre-transformation is adapted to the analysis of species abundance later. In the document about the mathematics of the algebra of PCA, if you go further down a moment ago, you will find the algebra for correspondence. Now I'll take a little step back from the mathematical concept and take a minute to explain how I consider intuitively what an ordination does, what you have seen in mathematical terms when Pierre showed that to you and what it does of course with the mathematical differences among those methods. But all those PCA, CA and principal coordinate analysis that I will address later have in common to find in a cluster of points which are the sites expressed in the space of their variables and find the orientation of an axis that maximizes its variance. You may not have noticed it on the small example that Pierre showed you. Meaning the variances of the original variables where this memory serves me well, 8.2 and 5.8. You had the covariances here which are not relevant here. And when you put the data through that magical equation for the extraction of eigenvalues, you ended up with eigenvalues which were 9 and 5 with the same total of 14. Now 8.2, 5.8, 9, 5. It's as if a part of the information here had been transferred to the first axis because this corresponds to the first and this to the second axis. We had only two in that small example. This is actually how all those methods of ordination work. Imagine that in our minds we live in a three-dimensional, at least for the physical, for the x, y, z part, we live in a three-dimensional world. In this world we can figure out any cluster of points. Imagine your data is a cluster of your sites expressed for three species or any variables you deem appropriate here in this room. So we have a cluster with probably not completely spherical. Otherwise you can't do anything because actually this would be the case where you have no information at all in your data. It would be completely random. But of course in the overwhelming cases you have some information, meaning that some sites are expressed in different, the values of the variables are different from others and you may end up with sub-clusters or simply with a cluster that is elongated in some direction which is probably a combination of those dimensions. PCA and the other two ordination methods actually look for the axis of that elongation, the greatest elongation because this is the one that will maximize the variance. Maximizing the variance meaning you seek the direction where you have most information because if the cluster is elongated then here you see that there is information along the axis. And it probably will present for us ecologists an ecological gradient, physical chemical variables if you do a PCA and those variables or express in terms of community composition if you are with species data. So this is what the PCA does. It looks for that first elongation and when it has found it it will look for the second one provided that it is orthogonal to the first one. So let's imagine you have your first axis in that direction look for the second one, rotate actually in a three-dimensional way it amounts to rotating the second possible axis up to the first one until you find the second elongated most important elongation in that axis. You have the first axis here you have looked at from that point of view you go here and you look how it is deformed in that dimension and you look for the second one and this one is of course orthogonal it has to be orthogonal to the first one and it goes on. In three-dimension you have only three the rest of it will be orthogonal to the two previous ones and you have your third dimension and each time you see that actually the projection provides a shorter view of your data and this corresponds to smaller and smaller amounts of variance the amount of variance that you capture on the different axis is given by the eigenvalues so you have the sum of those eigenvalues which give you the total amount of variance and if you want to know the proportion of variance that is explained or not explained oh no forget that at this point we don't explain anything we present your data but the first axis the amount of variance that is expressed or represented on the first axis is given by the eigenvalue of that axis divided by the sum of all eigenvalues so for instance in one of his last examples Pierre told you I don't remember the figures but that the two first axes were presented I don't remember maybe 30 or 40% of the variance of the data this is actually what he did he took the eigenvalues of the two first axes and summed them up and divided by the total number of of the total sum of all eigenvalues of the total variance in the case of PCA and that gave him that amount this is how he could tell you that this represented so and so much variance okay so for people who are more comfortable with such kind of representations I for one I'm one of those I need an intuitive understanding of that what I'm doing something that is maybe an analogy but as close as possible to the real thing you don't deform that so as to have no relationship with the mathematics but it's another way that I suggest for you to represent and now PCA does that in a strictly Euclidean world except in the cases of pre-transformation in the case of species data correspondence analysis does this in the world of the chi-square distance which is not a distance we are familiar with in our all-day everyday life but in any case consider this as any another type of space that is adequate in that case for species abundances the consequence is that you do not pre-transform the species when you want to run a correspondence analysis you use the original raw values because the transformation is actually embedded in the method itself it does it automatically if you pre-transform maybe a chord or a helinger data to submit them to CA you end up with something that is not interpretable so don't superimpose those two don't make a mix between the two either you use PCA and you pre-transform your species data or you use them raw and you submit them to correspondence analysis now this gives exactly the same result the answer is no because for several reasons the closest pre-transformation that you could use would be the one that leads to the chi-square distance which is actually the one preserved in correspondence analysis if you pre-transform using chi-square distance and run a PCA you'll end up with something which is very close to correspondence analysis not exactly because CA has internal way of working mathematically that waits the procedure has a little mathematical characteristic that are quite a little bit different so the two results will not be exactly the same but they will be very close if you use other types of pre-transformation maybe a chord or helinger distances prior to PCA then you may end up with a result that may be a little bit different from those but equally valid now what do you need to run correspondence analysis I told you I'm a practical I'm the guy who uses the method and thinks about how to use in an optimal way so you need dimensionally homogeneous variables meaning you cannot take maybe your physical chemical variables of all kinds with pH and degrees and milligram per you cannot use them in CA in correspondence analysis you have dimensionally homogeneous which is of course the case with abundance data and the smallest value must be zero you cannot have negative values in CA if you try you simply get a narrow message and it simply doesn't work because I remind you this is based on the analysis of a table of frequencies so frequencies by definition cannot be negative for technical reasons so what you may find in CA actually produces one axis less than the minimum of the two dimensions of your data table meaning if you have 20 sites and 10 species you will end up with 9 ordination axes so if the table had more species than sites then it would be the minimum the number of sites minus 1 so this is a technical point also the fact that it depends equally on the number of sites or objects and the number of descriptors is also a heritage from the contingency table thinking which is completely symmetrical as you may remember you can define the same contingency table this way or transpose it and you will end up with the same results here you have that characteristic inherited here okay up to now understand and now I have to introduce another of those points that is important Pierre did not address that specific point in those terms but he explained you that there were two different ways of representing the results of a PCA one that emphasized but that preserved actually the Euclidean distance among sites to the detriment of the representation of the of the variables those had representation that did not correspond to their the angles did not correspond to their correlation so we have the best representation of sites but the species or well if you have point transform variables we are not optimally represented this is called scaling one so in the analysis you are going to run for PCA and for correspondence correspondence analysis you will have that choice you ask for the analysis is done the same way but when it comes to draw a biplot then you have to choose between scaling one which is the one preserving the distances among sites and scaling two which is the one preserving the distances in PCA the correlation among variables which are represented as arrows and in CA as we will see the variables are specifically here the species are also represented as points so you have the two so same as in PCA you ask for scaling one if you are primarily interested in representing the ordination of objects on the basis of sites on the basis of species sorry technically this means the objects are at the centroids of the species I show you graphical examples and you will see how this actually looks like so the chi-square distance which is at the core of the correspondence analysis method is preserved among objects for that scaling one how do you interpret it the distance among objects in the reduced space approximate the chi-square distance why do they approximate simply because you don't see all the dimensions in your in your biplot you have two dimensions if you had 15 axes you just see part of the the full ordination of course so this is an approximation but thanks to the fact that CA as well as PCA extract the most important part of the variance on the first few axes if you check that you have a good amount of variance represented on those axes you are confident that those main features that you see on the maybe the first plane the one by two plane these represent the main relationship among your sites quite well so points that are close to one another in this graph are likely to be similar in their species relative frequencies contingency table thinking if you are familiar with that so conversely any object that is found near the point representing a species is likely to have a high contribution of that species meaning that that species is probably quite abundant in that object or if you are in present substance it's likely that if you have a site point close to a species point then the species is probably present in that site I have also a small example for that by the way I started my power point presentation at slide 36 why? because all the first part the first part of my presentation concerns what Pierre has shown you before but of course presented in my way so we have complementary ways and if you are interested to see the PCA presented in that type of style that I'm presenting you the other method you are free to go to that other part of my documentation which I left in this slide for you so now we have the smallest possible example you need at least three dimensions for the three items either three variables or three objects because this will end up in CA with two dimensions as I told you you lose one for mathematical reasons so here you have those objects and the three species this is the result for scaling one that you have that I have obtained and you may see although it's not obvious I'll take I go from one to other so nobody's jealous of the other part of oh maybe to avoid that tomorrow morning everybody piles up there and there's nobody here so you may see there's a one here 0.23 rounded and lambda 2 0.08 5 9 these are already values that are reported in proportion to one or are they oh no no they are not in this case they are raw values so I'll to assess the proportion of variance or in this specific case in CA I can't speak about variance it's not true variance it's a quantity called inertia I have a slide for that so but anyway if you have if you sum up these lambda values you'll get the total in Russia and in the same way as in CA you divide this value by the total to get the proportion represented by the first and by the second axis and what you can see here in terms of interpretation for instance in though this situation you have objects 4 and 5 here which are quite close to one another relatively close because they have all species in common in relative frequency that are comparable not completely the same but they are still comparable and for instance species 3 is at approximately the same distance between objects of objects 4 and 5 this is the species because again in terms of relative representation is quite close maybe a little bit more to object 4 and 5 but in any case since it is also represented elsewhere it has been also placed accounting for all those different relative frequencies but then you could be quite sure that just looking at this that species 3 is present quite an amount in those sites and absent or smaller abundances in the other sites here a word maybe here about object number 2 object that end up close to the origin of a by plot are always tricky to interpret because and this is specific especially the case in CA you may have 2 different at least 2 different situations that could end up here in such a situation the most likely one is that this object is something like contains an equal representation of all the variables or most of the important variables which is the case here you see that the 3 species are present in abundances so this object does not distinguish itself going and therefore it is not projected in one of the corners of the by plot here it is close to the center another possibility for instance for X1 would be the unlikely case that a species would be present here and here so in the opposite corners which would also end up of the sites being present in the middle so always be careful usually objects that are close to the center have ridge objects meaning that you have a relatively balanced representation of the variables in the object of course you could continue here so this is another case here again with the oribatid mites data that I have collected with my wife in 89 during my postdoc fellowship in Pierre Lejean's lab and I told you that in scaling 1 we had the sites presented at centuries of or weighted centuries actually of species so you have the species all around here each site here is actually a weighted average of it's projected at the weighted average of abundance of all the species so you take the species scores as we call them the coordinates of the species on the axis and you weight those coordinates according to the frequency of the species and you obtain those points here so scaling 1 the chi-square distance among sites is preserved but not the chi-square distance among species but species can of course be interpreted in terms of their abundances within sites so we have the sites here and this group of sites is likely to contain many of those species here for instance and of course opposite layer you may probably expect that biplot do not contain any of those species here okay fine now you have the other scaling which is the type 2 where it is the species so the columns that are the centroids of rows so it's the opposite so it's the scaling to choose if you are mainly interested in the ordination of species for instance if you want to see if you can eliminate species association or assemblages within your ordination graph that would be the way to go in CA using scaling 2 and the chi-square distance is preserved among variables meaning here among species so in terms of interpretation the distance among species in the reduced space reduced space means simply that you see only two axes at a time approximate the chi-square distance for the same reason as with the sites so species points are close to another are likely to be represented in relatively similar frequencies in sites and any species lying close to a point representing an object is more likely to be found in that object so this is the same ordination as before I mean scaling 2 so now you see that instead of having I go back here instead of having your species at the periphery here and the objects at the center of the species you have now the reverse here so the objects are here all around and the species are here this representation optimizes the representation of species of course this is but a small example so there's not very much to be said the relationship and it's an artificial one but for instance species 2 and 3 here are at a given distance here species 2 species 3 as you may see there are on the first axes there are almost well there's almost a continuum you can also try and interpret axes by axes those ordinations so here for instance it's also true for the cA by scaling 1 you can see the gradients that had been extracted axes by axes the graded on the first axes those axes are sometimes called latent variables meaning that you have a combination of variables representing a complex ecological gradient and this ends up being represented in the one ordination axes the first and then the second one and so on so here for instance on the first axes you see that continuum between species 1, 2 and 3 according to the way they are more abundant in objects 3 and 6 for species 1 if you look them here and here well you have also here in object 2 species 1 is quite well represented as well and then you go to species 2 here which is quite in the middle here for the first axes in the first gradient but which is more separated on the second axes the second most important features present in the data as being closer to object 1 for instance then to the other ones so of course if you have if you have more complex situations like here for instance this is again my everybody might data but in scaling 2 so now you see also the reverse representation the opposite situation where you have the sites that are plotted here at the periphery and the species that are centred here so here is the represented although it's difficult to find out groups or delineate groups of species here in some cases it's clear in some other data sets you may find that sites may be here and another one here and another one here you may have some of those concentrations here but they are not extremely obvious remember that ordination is made to to show you continuums ecologies about continuity of course in special cases you may have ruptures clear cut situations where on one side you have a community and on the other you have something fully different but in those cases generally this is a trivial separation so instead of wasting axes showing what you already know you are likely to run separate analysis on the two groups because they are obviously separated if you try and analyze a real case situation where you have one group of species vegetation for instance in the forest on a dry on a dry soil without trees and just about a little bit of vegetation you already know that this is different so you don't know a need in your ordination to show you that but the structure within would be interesting to to show in an ordination so of course such an extreme case would end up with groups of sites and species completely separated on the first axis that would end up in the opposite directions and that would be true as well with PCA with pre-transformation no question? fine now attractive as they are those methods have their shortcomings nothing is perfect in this you have already discovered that you don't have an optimal display of both species and sites you have to choose between scaling 1 and 2 so this is one of those shortcomings but there are others and for CA since I'm now in CA if you look back at the slides I have some words of caution about PCA as well that limits what you should do and I'm not the uses and misuses of PCA here for CA well since it is based on contingency table thinking you know when you run a contingency or a cash square test you're looking for special cases extreme cases because those are the ones that will most contribute to this CA statistic and bring it over the critical value for your test and actually correspondence this reflects this specialty in throwing away in large distances the cases that are very special and special cases in ecology mean for instance very rare species rare in the sense that you don't find them in many sites if they are extremely rare and found just in one or two sites and on the top of it just as one or two specimens well you may have to think if you really need those in your ordination because at least graphically they won't influence the eigenvalue very much because rare species with low abundances do not contribute to the inertia of the variance if you want but they will graphically produce point that are extreme and by contrast everything else will be bunched at the center of your ordination graph and not very legible so well in some cases I tend to filter out everything that is present in less than given percent of the sites or in one or two sites it could be also something like that well be careful about that so this is all of the question of asking whether again this is meaningful information when you have sporadic species that happens in one of the other sites in any case it's extremely difficult to find meaningful ecological information about this this occurs less in plants which are less likely to run away and wander around except maybe for the ants in a lot of the rings that you may have seen where you have trees that wander around but for the rest of the practical all these situations it doesn't likely occur but for insects for instance or other mobile species it may happen quite often that you have accidentally captured one species that has nothing to do here it just happened to go there and had was unlucky enough to fall into your trap but actually it was not its destination it wanted to pass there so think about that in the case of CA especially because PCA is less prone to that kind of behavior it will end up with a very short arrow and it will not disturb you anymore but another point is important about CA it is also with PCA I'll come to the comparison in a moment but it's generally presented in the case of CA because it's there that it generated most discussions among the ecologists and this is called one of the arch effect CA and the inventor of canonical correspondence analysis which is the constrained one that you will see later on in this course brings up gradients that can be well interpreted in terms of the species packing model I mean the model that predicts that species are organized among along ecological gradients in such a way that every species has its optimum with a given tolerance around the optimum so this may be an extreme case an artificial one of course that I have presented here but the consequence of this in the so those species have the so called unimodal distributions and CA is well suited to represent this because it projects the species optimum along the ecological gradients represented by the ordination axis so this is all well and beautiful but then how on earth with a simple gradient like this do you end up with a representation like this one at last people usually scream and say this is a mathematical artifact and CA is not good because it produces such a thing it's not as long as you consider this as an artifact you cannot understand what's happening actually it's the only way the poor method has to represent what you are asking it to represent let's have a look at this packing model let's start at say at the round center here you have a species with the optimum here and another one optimum here and here and same on the other side those ones have sites in common here the first one for instance is less abundant on that side but it's still here and it shares that sites with some of the species that are around it along the ecological gradient so these are common of course a way if you sample along long gradients which is very often the case arise a point where this species is not found in common with another one which is here so this is a case where you would have a presence for that in this site an absence here and the opposite for the other species it's not the double zero here the problem of the presence of one species here and the other one so this species and the second and the third and the fourth they are progressively different in terms of their representation among the sites up to the point where they simply do not share the sites together ok so all is well up to now but if you see the same effect on the other side which is of course the case you will end up with situations where species sees to be to share common sites here and there and all the regions here at the extremes so technically the distance between this and this is maximum it could not be larger from this and this and this species there are more extreme but you have the double zeroes and then they have their distribution among a series of sites on the other side so once they do not share anything at all they are completely different in terms of representation in the sites so you are at the maximum possible but meanwhile these species continue to be progressively more different from this and this so if you consider it from the point of view of the maximum possible distance this tops off at the point where they share they are not shared by any pairs of sites but locally here and on the other side they continue to become progressively different so how would you represent those two contradictory notions or geometric situation in one single axis it's just not possible so the answer of correspondence analysis is to combine so in Pierre I hope I did not destroy your your mouth otherwise it doesn't seem to be very serious anyway so you see the two realities are represented by the two axes the first one if you project the sites although here I have many sites so it's not extremely well you cannot see it very well but if you take one out of three or four sites you will see that the distances between those will progressively decrease if you project here it's going vertical here so distance between this site and say this one is very small here while three sites away here it's larger so this is the first reality you tend to have to go to the vertical representation here because sites become progressively different from those here and they cannot be much different from one another when they reach the point where they have no spaces in common so this is the one reality and the others are represented on the second axes so they continue to become progressively different because they are close to one another the sites along the the gradient continue to be close to one another on that closer relationship so it's not an artifact in PCA creation is even worse worse even if it's more beautiful in terms of graphical representation why on earth is it possible that those sites that are most extreme come together here a clue here this is done with species data without pretransformation now I ask you how is that possible that those extremely different sites that have no species in common almost no species in common end up being closer to one another then other pairs of sites here the double zero effect exactly when you have less and less species in common you have more and more double zeroes in common and these tend to put those sites together so a PCA run on species data without having pretransformed those data ends up in such representation which is completely away from the reality in that case while in CA at least you have a limited effect as I told you for quite a time this has been considered as an artifact and people have tried to correct it and especially they tried to correct it by a method called detrended correspondence analysis meaning that by segment here you take the sites here and you put them into zero so you put it straight of course in the real world you have dispersion around there but you would take a portion here segments along the first axis and put the mean equal to zero and then represent all the thing here this is called DCA detrended correspondence analysis but to correct something that is not an artifact it's actually useless despite the fact that many people still tend to use it for whatever reason it has been rejected for many reasons that I don't have time here to prevent those reasons but there's no real reason to use that detrended correspondence analysis because even with the geometrical fiddling here the second axis is meaningless you cannot use it so remember that better have honest represent not a detrended representation that way and locally interpret the concentrations of sites of species that I showed you before here than trying to redress something here that was not an artifact in the first place and end up with something that is completely useless in terms of ecological representation oh energy, basic resource in ecology it's coming short for some people I understand that fully okay for CA sorry when you work with the transformations you still have an artifact but not the whole shoe which is really the worst situation you can try maybe extreme effects you will still have some residual whole shoe effects I don't remember having seen it in any way it's well corrected well you may have a little bit a little rest of it in some of the transformation with some data I remember having seen or tried and ended up with something like that but it's by far not as extreme I think in your paper you had something like that about the transformations you should look up the paper and see in the figures but I think we have something like that but it's by far not so extreme other yes well points near zero always mean that for given in the most general case points that are near the origin are average in terms of the variables so they have no extreme abundance on any species or if they are in the case of PCA they have no extreme values in the environment environmental variables all those environmental variables would have quite well values close to their mean to the average with that so that makes them difficult to interpret because in some particular cases it may be also the compromise between the fact that they are in the other variable that our positive one another in the graph in that case they would end up close to the middle as well so always be careful about the interpretation of points in the middle of the ordination of course that's a good point of course it's always possible that a point is close to the origin on the two first axis but if you draw the third one it happens to be very far away yes you have understood that they usually show the two first axis first because obviously they are representing the most of the variation but the third and possibly even the fourth may be interesting in such case it's customary to keep the first one and as a basis and then going to the first by three and first by fourth axis to always have the main trend as a reference but yes you're right you can do that I showed a double presentation where we he showed first by second and first by third axis yes excellent remark okay now I have couple of minutes left to present the principle coordinate analysis this well up to now we have PCA with Euclidean distance and now of course those pre transformations that opens it up to a couple of others but not many actually only though that have a Euclidean component embedded in them to call distance Hellinger distance, Chi-square distance and so on, so the five of them and then you have the choice of CA with the Chi-square distance it can be that for different reasons you are interested to produce a ordination based on another type of distance as I said community ecologist among you may be familiar with the so-called break artist distance which is often used in community ecology so the break artist cannot be is not respected in any of the flavors of the two ordination that I have shown you other situation may call for more exotic distances or dissimilarities in general terms for instance when you have a set of variables that are not only not expressed in the same units but some of them are qualitative, some semi-quantitative, ordinal and other ones quantitative so how do you put them all together there are some specialized dissimilarity measures that can cope with that kind of data Gower distance or it says Esterbrug Rogers, okay Esterbrug Rogers distance are such examples so how do you do that could you still produce an ordination the answer is yes thanks to principle coordinate analysis this is what it does it takes not the raw data but already the square symmetrical matrix of dissimilarities and produces an ordination from that as a consequence of course since your dissimilarity matrix is among sites in most cases what we call a Q-mode sites compared in terms of their variables so you have only dissimilarities among sites in this matrix the variables have disappeared all together so the ordination produces only an ordination of sites if you use other dissimilarity measures devoted to variables instead of sites it's called the R-mode and then you would have only the variables represented on the ordination but in both cases not in both cases but in case you have a dissimilarity matrix among sites which is the most frequent situation you can still afterwards a posteriori project the values of the variables in the ordination plot so PCOA the principle coordinate analysis not to be confused with principle component analysis shown before by Pierre will end up with the same result as PCA if your matrix your dissimilarity matrix is made of Euclidean distances so this shows simply that it works correctly it takes the representation you give it and it runs an ordination in an appropriate way and in the case where you have more exotic of those including the properties distance or dissimilarity those have some geometrical properties that are difficult to cope with by an ordination method because they are not fully representable in a Euclidean space as a consequence you get something very curious and this is negative eigenvalues I told before that eigenvalues were the result of a partition of the variance among the axes but there you have negative axes simply because all of the variance cannot be represented in real axes for some dissimilarity this can be corrected mathematically by a transformation again of the point but this goes a little bit beyond what I want to show you today the principle is this one so for the six objects and the three species that I have shown you before you would obtain something like this here a simple coordinate analysis based on what most people know as break artist but actually should be known as percentage difference dissimilarity so again here you can interpret this is like a representation in scaling one what you get in this case so you can interpret this in terms of distances on the ordination plot here for or about my data it would give you this so you have actually a scaling you have you have the sites represented here and you can represent the variables in two different ways on the one hand you can think of it in terms of weighted averages like in CA so you say okay there are such and so many of these species in these sites and if you average the scores of these where this species is present and multiplying each scores by the abundance of the species you end up with a point here in a very simple case if you had for instance a species that was present only in sites 38 and 67 here in equal amounts of a line connecting the two points and in terms of weighted average if the species had more had a higher abundance in 67 than in 38 it would end up closer here between the two sites closer to 67 so this is the principle this is one way but not to my idea not entirely satisfactory but it is considered the axis as some kind of summary quantitative assessment of the organization of the sites and then you try to correlate the variables to these axes and you use this information here I have presented it on another data set to be also faithful to that other data set it's the do fish data set that you will use it's used in most of the chapters of the yellow book here orange book so for these 30 sites in the do data sets here I have projected the species as arrows so this is a kind of PCA thinking I have run the PCA on the do fish data and I should have added here that it has also been run I think with the percentage difference of breakers distance and the species are simply you correlate the abundance of the species with the scores of the sites and you end up with a correlation that is positive or negative depending on the orientation yes Pierre Mille not has told you that during the computation of an ordination the orientation of the axes is arbitrary there is an arbitrary decision of the plus or minus at some point in the computation so this or a mirror image in two axes would be equally valid there is no point about that but ok in this case you end up with those arrows that can be interpreted like in a PCA ordination score where you have in this case you know that these sites have a higher abundance for these species of fish and so on and so forth almost finished was I supposed to finish at 1230? yes so ok if you allow me a couple of more seconds just to finish so principle coordinate analysis can be used as it is as I showed you here but it is also used as an intermediate technical step in more complex analysis it will be especially the case in day 5 when we presented distance based more and higher vector or maps in the context of spatial analysis where we will use principle coordinate analysis so to summary everything that has been shown you this morning in terms of ordination I present you here a graph figure by so what we have seen is first the classical approach with raw data and PCA in case of short gradients you could use here if we speak of species data if your gradients are really short and you don't have much of those zeroes and double zeroes you could use PCA but be the gradient short or long you can use CA and you end up with an ordination plot of the kinds I have shown you you can of course transform the species as I showed you and this gives the TBPCA so transformation based approach here that was the first part of my talk about the transformation and finally the distance based approach where you start with a dissimilarity matrix this is PCOA which was my last part here and as you will see here in the course the equivalent will be here available for the constrained ordinations which combine response data and explanatory data for those different methods but this is another story ok thank you so I guess we all have merited our our lunch so let's proceed to that very short Pierre is laughing we were expecting that question because well ok we did not present non-metric multidimensional scaling because actually it's not an eigenvalue based method it's not linear it represents some kind of a well how can I put this a rough ordination but the only use maybe in special cases where you really need absolutely to represent some part of the variability in a restricted number of axes that are pre-imposed I mean you need two or you need only three and not more but then at the price of a complete deformation of the structure which is not it's it tries to respect the rank order of the sites on the representation but with more or less success this being represented by so called stress factor Pierre maybe you want to yes we have a slide with a comparison because in all case where you would need that you could probably use a PCOA so principle coordinate analysis and yes we have a slide with a comparison of both okay fine so bon appétit