 I'm Pierre Legendre from Université de Montréal in Canada and I'll give this course with my colleague Daniel Barca as you know and first I would like to thank our guest Kozimo and Vinko who have organized this this fantastic meeting you know there will be eventually 64 of you in the on this side of the class and you are representing 24 different countries have never given the course to such a diversified geographically diversified group of people actually the idea of this course started when we met and discussed this idea during another conference in Toulouse two years ago you know this is the sort of time it takes to organize this sort of event and I'm very glad that thanks to your efforts the two of you and this course came true and now we are going to to start it and see where it takes us actually in you may have been impressed by the amount of material available on the web page it is because everything is in double I have the prepared the material for the full course and Dr. Bachar also has material for the full course and we will each teach half of it so everything is more or less in double we have two different approaches in my presentation I may become a bit more mathematical at times for those of you we like to know why it is that we are using this and that equation and you will have a demonstration of that this morning where I will describe principal component analysis but insisting on the equation the one equation that is at the basis of this sort of analysis and then how we use it to represent our data and later Daniel Bachar will come back over well other aspects of the same subject and will show you more about the applications the interpretation and things like that so maybe some people here are interested in knowing that there is real mathematics behind these methods and others will be more interested in the application portion so there will be a bit for everyone in our alternating presentations so I started with this slide what is numerical ecology because Cosimo was kind enough to show you the picture of the books actually I also have pictures of the same books in this presentation but I may not have to go over them and so this whole story started with the field of the development of the field known as numerical ecology and in this course here we will go over some of the methods of numerical ecology and quickly move to the analysis of the spatial data that is organisms and communities are spread in space and the second part of the course will insist on that how to analyze these data that are multivariate so here I say that the numerical ecology is the field of quantitative ecology it is a portion of quantitative ecology devoted to the numerical analysis of ecological data that are in essence multivariate because we have so many species and with emphasis on community composition data I could give the same sort of course to geneticist and just change community composition for into a genetic composition and the methods would be exactly the same so here we are among community people so I use community composition data and community ecologists our data are multivariate by nature because we have many species and many environmental variables so it is impossible to analyze our data using elementary statistical methods that deal with one variable or in the case of correlation or simple regression with two variables we have one response data table the species that may have hundreds or thousands of species and we have another data table with the environmental variables that may have also many and many environmental variables and so community ecologists are the primary users of the method described in our books but the methods are not limited to community ecologists and I understand that some of you come from fields where there are maybe other types of biological data or even other types of data all together the oceanographers and so you may find some utility in the methods that will be described in this course so numerical ecology is not a bag of methods it is a bag of ways of answering ecological questions we're if we will discuss the mathematics of the methods it is because we have real ecological questions I'm not a mathematician Daniel Bacar is not a mathematician we are both ecologists okay so we will speak from the point of view of ecologists even though we will describe the methods I hopefully correctly and so methods are chosen to answer questions ecological questions or in other fields genetic question genomic questions nowadays and to test ecological hypothesis about the data and then I will have to I think it is tomorrow or the day after I will discuss statistical method testing methods that can be applicable to multivariate and essentially non-normal data and we will see that such methods exist they are the permutation test and I will describe these tests so I will quickly go through the the slides of this slide show that I prepared for a completely different purpose what do we have here nothing well we have something only there there is except that we lost the slideshow how about that this is typical of Microsoft software you know blows up unexpectedly this is where we were this is the first edition of our book what you see here is the fifth edition we were had two editions in French and three in English so I will go through all this this is my co-author Louie Lejean who is an oceanographer so for those of you who are oceanographers there is a strong input of oceanography in our book and this was taken a few years back yeah then we had this development book the 1998 edition and the current edition and then here in 2002 at my brother's place in this we were discussing then the the preparation of this new edition as you see we do that very seriously and then came Daniel Boca and who is the lead author on our book of numerical ecology with our and Daniel worked very hard to publish that book in 2011 before the new edition of the green book of the numerical ecology book if he had not done that I would have had to include the 300 pages of that book into the green book which would have taken the wheelbarrow to carry here so all the our material is in the new book is in this and the the orange book as the the students in our lab call it and then in the green book at the end of each chapter there is simply one page referring referring to all the our functions that are that can be called upon to implement the methods described in the chapter so it made the green book only a thousand pages long and now who are the authors of this orange book here they are well you have Daniel Boca and the back of the room at the other end and this man François Gillette you will meet him on Thursday and Friday he will come here and I will meet him on Thursday and Friday I've never met him nowadays you can write books with people without meeting them face-to-face so it will be a pleasure for me to meet François Gillette from University de France Roté okay and of course you may have found or we will remind you about this and this web page associated with the book where do we distribute free of charge all the man the our material of the book in this updated material folder zip folder it contains all the our code of the book contains all the new functions that we developed for the book and it contains the data sets used in the book so people don't have to buy the book you can have all that free and you will have excerpts portions of that material in the practical sessions prepared by Daniel Boca for this course here okay oh and then of course if you speak Chinese you can buy for a much lower price than the springer edition you can buy this the Chinese edition of the American ecology with our book okay that's about what I wanted to show you as a starter so we won't need the PowerPoint anymore what else do I want to tell you well the course outline that's the next thing in the course outline you may have looked at it to today we're going to talk about ordination day one and I will start by introducing principle component analysis and after the coffee break Danielle will do sections two and three on transformation and order ordination methods tomorrow we will talk first about measures of similarity and distance and we will one of the important messages of the application of numerical ecology methods to community data is that everything is based on a measure of distance and the measure of distance is very important and it acts as a filter on the data you have your data there but then they can be filtered in different ways they can be filtered by distances that will transform them into present absence other distances that will norm the the rows of data so that the they are interpreted by the distance as if there was an equal number of individuals at all sites or distances that take into account the differences in productivity of the sites so you see that the choice of the distance that does this filtering will determine the outcome and you may want to analyze your data in these three different ways because they will answer different questions about the data so we will talk about that tomorrow and the idea is to come to here canonical analysis canonical analysis contains is combination of three different groups of methods that we'll see here it it is based on multiple linear regression that is included in there it is based also on ordination that is part of canonical analysis and the all the hypothesis testing in canonical analysis is done by permutation test so we need the material of today and tomorrow and this in order to understand canonical analysis this course is cumulative the different sections will finally build into something called canonical analysis that we will then use in days four and five to analyze the spatial structure of our data yes here we'll finish day number three with two different applications of canonical analysis here on day four we will talk about we'll focus on beta diversity I could have entitled the course beta diverse analysis of beta diversity actually this is the objective and we'll see how this can be done and on day five we will go into the more sophisticated methods of spatial analysis that are based on canonical analysis and the construction of these fancy variables to represent the to model the spatial structures called the Moran eigen vector math so this is the the plan for the week it will the complexity will increase but hopefully by doing that step by step every day you will have only a small step to to go through because we will have gone through and the more elementary steps in the first days okay so that's the plan for the week what that means power point oh yes power point has failed we noticed thank you what is required from you I think two things it may have been said I hope it was said when the course was first offered to you it requires on your part an understanding of the elementary statistical methods I will assume that you know what the correlation is or analysis of variance is and also ordinary statistical test will assume that you have some idea of how that works because we will build upon that to present the material of this course here then you come from a variety of different fields and we may not have examples it's directly suited to your field so you will have to do the translation yourself looking at the examples that we will present and try to adapt that to your own field to understand what is the correspondence and with that I think I will go on with the first subject which is what is ordination so ordination and principal component analysis this is an ordination this is the very important and interesting concept ordination is placing things objects that are fruits here in some sort of a reference space in the diagram in which the objects are spread according to some axis here we only have two variables and it is the difficulty of getting to the what we want to eat in the fruit and the sort of taste here very tasty that is sweet and so on and here the author wrote on tasty but it is something like lemon okay and in this representation if we have two variables these two variables for each object of course we can plot a point for each of these objects each each of these fruits in the diagram but and this is an elementary thing you have done that in high school but the interesting thing is that the objects fruits that end up being close to one another they must have and the same sort of characteristics for these two axes or if we are dealing if we are working in a multivariate space a space with many many different variables it will be the same thing things that are close must have close characteristics on these different axes so the concept of distance comes into this the distance measured by these different axes between objects that are close to one another the distance but much be small and then for things that are very far like peaches and so on compared to lemon and grapefruit if they are far in the diagram it's because the distance between these two groups of fruit is large as measured by the variables that we are dealing with so you see we have the concept of ordination right here and contrary to other methods like clustering that may be very efficient that producing groups of objects but don't tell us anything about the relationships between that group and that group here in an ordination we have a general model model that applies to all the objects in the study and we know that things that are close must be similar and things that are far must be dissimilar where it's in clustering you know that things that are in the same cluster must be close but must be similar but things that are far away you don't know exactly who is more farther than other groups so ordination is very interesting and this is why it has been used by ecologists for at least half a century now here is this is a simple ordination with two axes but what do we do what we have many axes I will run an example for you where we have three axes and this is in one of the documents that I gave you on the web page it is a script of a graphic rotation the data I will show you the data a bit later it is a data set that has been used by many authors to illustrate methods or to yeah essentially to illustrate method in the it is that a set from the literature from art and co-authors from the Netherlands they are hunting spiders from the net then the Netherlands are 27 sites I think and 12 species of hunting spiders but that's not the point at the moment I took these data did some transformation and so on and I have them in I have something here that is a summary of these data three-dimensional summary of the data in this object called spider's sight that has 28 and yet 28 sites and three objects not 27 and three variables sorry these are artificial variables so I prepared that last night so that it would be more quick today now I will open a window which looks black at the moment let me put it a bit bigger like this and then the next thing will be to plot the points I will put them as a green point you see color equals green okay my diagram you see the points but what's the meaning of these points let's add some axes here I had calculated the range of these axes and I will plot lines corresponding to these three axes only have three axes since I'm copying from a PowerPoint it copies these end of lines that have to remove otherwise R will not like it okay now I have three axes you see three axes there are three axes where are they we will put labels on the axis I removed the end of lines whoops not too much I will label them axis one axis two axis three here we go where's the picture axis one axis two oh and there is an axis three look at that it's there and now we can play with that we can rotate the points and principle component analysis is essentially a rotation of the points now what is the best rotation for these points would it be something like that where we see that a bunch of points here and a separate group or would it be something like this where we see well a bit of a strange structure or let's see what else can we have something like that where all the points are more or less in a big ellipse well it turns out that if we try all possible orientations of these points you would probably agree that the best representation is probably this it's better than anything else because here for instance many points are in one row and one group is separate but here is when we see the best spread of the points so we may choose that to represent our points because here we see that there is a group here a group there another group lose group there in one point in the middle this is site number 25 by the way I know it and so maybe this is the best possible representation of our point sorry I cannot point with two pointers I'm pointing to this one and actually this turns out to be the representation following axis one and axis two of a principle component analysis but why do we say that it is the best representation and I guess why would that be better than when all the points are in one row one yes as so what would be the statistical term to to say that that the points are more spread the variants right so we want to the points that this representation in two dimension to have the largest amount of variance and now we go to statistics with the word variance and I'll show you how we do that before that I can probably show you the correct representation of these points I have here the command to yeah by plot of the output object of my principle component analysis you will have the same spread of the points in the representation that I will produce here the points are spread in the same way you have this group here that's there this group is there these four sides they are there these four well four five or six are there and side 24 a 25 is here you see and in addition here in principle component analysis we have plotted the variables that are here the species of spiders and so we can interpret this graph more completely than this one by saying by the way these arrows here they are centered at the mean value of let's say this species of spider so this is the mean the points on this side have values lower than the mean and points on that side have values larger than the mean so we may say that these these sites here are probably characterized by the fact that they have a larger abundance of these two species here these sites have a larger abundance of that species part those are the gubris is the name so terrible looking spider and and so on you can interpret the position of the sites according to the abundance of the species keeping in mind that if this these sites have more of that species then these sites have fewer smaller abundance of that species because they are if you extend that line on that side then you are in the negative side of the abundances compared to the the mean negative compared to the mean okay so they have fewer of this species on that side and more of that species on this side and so on so this is how we can handle multivariate data in a graph that is meaningful if and only if the variance the amount of variance in the graph is fairly high compared to the total amount in the original data set so now I'll spend the next few minutes showing you how we do that how it is done and what's the mathematics behind this exercise okay but if you understand the objective here you understand the objective of ordination now I'm going to use a document that is also on the web page called algebra of principle component analysis this document also contains the algebra of the two other ordination methods that Daniel Barca is going to describe after the coffee break that is correspondence analysis and principle coordinate analysis so you have the algebra of all these things later in this document but I will not present them Daniel will present these two methods in his own way and but you can come back to this document tonight if you look at it what are the nights for so you can use them you know to complete your readings of the material the abundant material that we are showing you here I have a very simple example so simple that usually we would not do a principle component analysis on an example of like that because I have five objects one two three four five and two variables only two variables can be entirely represented in the graph with two axes so there's no need for a principle component analysis still I will do it in order to show you what principle component analysis does to your data in detail so it is only a classroom example it is not a real-life example the first step is to center the variables on their respective mean you calculate the mean here and you subtract it from each of these these values and you obtain this column of values for the second column you do the same thing compute the mean subtract it from each value and there you go so here if you computed the mean of each column after the centering the mean would be zero and we will use these two centered variables for the final representation like for the species the spider species data they were centered it means that for their representation the values were taken from this first transform matrix which is simply centered then we compute the covariance matrix that gives us the relationship between the two variables covariance is computed with the function cove in the r language so if we compute cove of y or cove of that we'll do the same thing it produces this small matrix this is the equation for the for the covariance using y index c that is the centered data so we the cove function does the centering does the scalar product and divides by n minus one so that here and there well this matrix compares variables one and two two variables one and two if we had more variables the covariance matrix would have the size of the number of variables and here we have the two variances and the covariance of very variable one with itself is the variance and same thing for the variance of variable number two and this is the covariance of variable one versus two or two versus one which is the same value of course so we have the basic information on which principle the algebra principle component analysis is going to operate what we will do then is put this s covariance matrix into this let's say for the moment the magic equation of eigenvalues and eigenvectors I could explain how it works but it would take another 45 minutes to explain that in detail so either you already know about that or you don't care or you may want to learn after this presentation but today I don't have time to explain why this equation works as it is written but then it is a bit strange for an equation because the only known thing is the matrix of covariance s and then in this equation you have a lot of strange things you have eigenvalues represented by the Greek letter lambda you have i which is an identity matrix a matrix that square matrix the same size as s with ones on the diagonal and then you have an vector here called an eigenvector associated with this eigenvalue this is a pair of information that comes out and this is a vector of zeros so with one thing known we will try to find out two different things the eigenvalue and eigenvector and there are as many eigenvalues as there are variables so this equation here because there are two variables it will produce two eigenvalues and to each one there is an eigenvector associated so two eigenvectors here are the two eigenvalues and here are the two eigenvectors that will be produced by them and yes this equation will work even though there are more unknown than known things we will use this equation repeatedly in this course because we will apply principal component analysis repeatedly or it will be used also in the next two ordination methods that Daniel Bachar is going to present so if we just call the function eigen in R and put the eigen we say eigen parentheses s we obtain the two eigenvalues nine and five and we obtain the two eigenvectors that are here nine and five why these two numbers and are they interesting if you add them up nine and five makes what yes it makes 14 you can say and then if you came back to this covariance matrix and say 8.2 plus 5.8 how much does that make 14 yes and it will always be like that the sum of the eigenvalues will always be the sum of the variances so the method of principal component analysis will take all the all the variances of all the variables here there are only two put them together shake them and decompose them again in a same number of values there were two we have two here but but the sum here will be the same as that but it will be reorganized differently these will be the eigenvalues will give us the spread of the observations on the new axis produced by the principal component analysis but that spread in the two new dimension will be equal to the spread along the original variables that's the relationship here is already a very very important point that I can make I was saying that we add these two variances and decompose them again in something that has the same sum isn't there a condition to be able to add two things what can you think as the same goes in English they say that we cannot add apples and oranges here can we add two variables that would be one in milligrams per liter and the other one in Celsius and we add them it would produce a very strange hybrid if we added them well we cannot add them or at least if you do it is totally meaningless because your milligrams per liter could very well be expressed in some other units in kilograms per liter then the Celsius could be transformed into Fahrenheit or into absolute degrees the scales would be completely changed and the addition of the two variables or the two variances would be completely different so do we want a method that will produce anything we like just by changing the scales we don't want that we want a method that will produce something repeatable yes so the rule here is that we can only add two variances if the variables are in the same physical units why is that well if you have a variable variable that is in let's see degrees Celsius then well my choice of wording is bad I should not have written var I should have written y because now I want to compute the variance of y and the variance of y will be in degrees Celsius squared variance is as units related to the units of the original variable so that the standard deviation which is the square root of the variance will be in the same units as that okay so variances have units and you can add variances only if they have the same units so if the original variables are in the same units and that leads to a necessary step if you have variables that are not in the same physical units right at the beginning of a principal component analysis you have to remove the physical units you have to standardize your variables and Daniel Barker is going to talk about that at the beginning of his presentation of transformations this is the most current the most basic transformations that we can make to data is to standardize them we do that only for variables that are not in the same physical units like physical or chemical variables of the water of the soil of the air and so on we don't do that when we have species data because species data are all counts and if we standardize the species data then the rare the rare species with very small abundances would have after after standardization the same variance as the most abundant species that have big numbers and we don't want that either so the rule is if species have different physical units you standardize them if they are all in the same units like species counts or frequencies in genetics you don't standardize them and this is one of the most frequent mistakes that beginners make they forget to standardize the species or they standardize everything so this has to be done in a parsimonious way in a surgical way you have to standardize only when needed okay where are we now we have obtained these eigenvalues and then we obtain these eigenvectors and when they come out of the function eigen the eigenvectors are scaled to a length of one meaning that the length of a vector is the sum of the squares of the values and then you take the square root so you square this plus this squared and you take the square root or not and you obtain one same thing there and it will be the same thing for all eigenvectors that come out of the function eigen they are presented in that way that is a standard way and then we can do what is required by the method with them but at least you know that in the beginning they are like that and we have almost everything we need now to obtain the representation that I showed you with the spider data our five points here in two variables will be represented in this graph called the by plot by plot because there are two types of information the points and the variables the position of the points first is obtained by a final matrix product where we take the centered data this is the matrix that was at the top of the screen there but now it is above the ceiling these are the centered data multiplied by the matrix of eigenvectors this matrix of eigenvectors actually produces a rotation of the points all these values can be interpreted geometrically geometrically as rotation factors they are actually rotation cosines they these are the the cosines of the angles of the rotation and by doing that we obtain the matrix that I call f in my book and in this presentation which is here that we use for plotting if we had more than two variables this matrix would have more columns but here in this simple example it has only two columns and the function here has two different scales and now we look at the black scale here and there sorry it does not come up completely clearly with the projection where we plot these values point number one here is at position minus three point something here and zero here so that's point number one point number two is at minus one point 34 something like this and 2.236 here this point number three is at the same position but lower point number four is at 3.13 here and 2.236 and point number five is here okay so the points are plotted from the values obtained from this matrix multiplication now we could use directly the eigenvectors to plot our species data we want to know where the original variables are after this rotation and this is given by this so if this is eigenvector number one eigenvector number two the first row corresponds to the first species this or variable the second row to the second variable we could use that for plotting point 89 and minus point 44 point 89 would be about here and minus point 44 it would give us a very small vector here and the second row would give us a very small vector for variable two so this function used for plotting blows up these values so that the the length of we would see the the vectors otherwise it would be too small to be seen in the graph it doesn't matter actually I told you that these eigenvectors are scaled to length of one it's because the these eigenvectors they are actually infinite vectors and in this representation if we change the length of the vectors it would change also the the tip of the arrows and the arrows can be as long as we want so they are rescaled here and plotted according to the red tick mark that you have there and there so that the we can see them with respect to the spread of the points it is a very fancy thing to spread them just enough but not too much because if you spread the variables too much then it is the points that will be shrunk in the middle of the graph it is a tricky thing to do but this function does it very well it is one of the function available in the stat the the stat the package of r and there we go we have the variables and the the points which with respect to them so we can see that variable one the positive values bring point number five far in that direction and we can check that if we want yep variable one point number five has the highest value here and the lowest value was for point number one which would be in the other direction point number one and number two actually they have values lower than the mean the centered values are here so values lower than the mean they are here and this is what we see so these values are or actually yeah these values they are the values of the points projected on this if you extend this line and project point number one point number two point number three you have the ordination of the points along the centered variable exactly so it is a complex geometric representation complex because the data are multivariate here be varied and in the case of our data they may be as multivariate as we want several of you will have hundreds of species or even thousands of species but we can do exactly the same thing with them okay that's the the mathematics of principle component analysis so in summary what have we done we took our points that were represented by variables number one and two we centered the variables that is they are now represented with respect to these dashed variables that are the centered variables the center being here and with the principle component analysis we obtain this representation this is the exact same representation as in the the biplot that I showed you before so what's the relationship between this and that this is actually a rotation here you have the the rotation that preserves exactly the position of the points with respect to these axes the points are exactly like here and with respect to these axes the points are exactly like there so after this rotation we have not changed the distances between the points we have just rotated the points that's all we have done okay so we say that principle components analysis respect or represents the distances among the points given by the Euclidean distance formula we will discuss distances tomorrow now there this is actually only the first way of representing the results of a principle component analysis there have been very long discussions in the literature as to the different ways of representing the results of principle component analysis and the the the question has been settled and for ecological data or genetic data what we usually do is either this or that representation this is the representation that we saw in the previous slides where the points are in this rotated space of principle components and with the preservation of their Euclidean distances among the space among the points and of course we reconstruct the values of the of the points on the centered variable by an orthogonal projection of the point on axis one and axis two for instance so it gives us exactly the original centered data but then there is another representation that can be used and it is a representation in this representation the variables are at right angle are still at right angle in the multidimensional space they may not look at the right angle but in the simple case like this where there are only two axis two variables they are at right angle in the graph also if you have 10 variables they will be at right angle but in the graph in 10 dimensions now in this representation we sacrifice the orthogonality in order to represent the relationships among the variables and here we want two variables that are strongly correlated to be not at right angle but at an angle that is smaller and smaller as the correlation increases if the correlation reaches one then the two variables come together okay this is what we want there and there will be cases where we want that with our ecological data but by doing that if the points follow they will not keep their Euclidean distances among the points we will change the the distances among the points and for those of you who like fancy statistical statements we will say here that the points are now at in distances called malanubis distance well we have people from india here so malanubis was the great founder of the school of statistics in india so malanubis space is a space where the variables are not at right angle but at an angle that is modified and this is done mathematically in this way for the left representation the distance by plot that preserves the distances also called scaling one in computer software for the objects we saw on the previous page on the first page that for the objects we use the matrix f and for the variables we use the matrix u this is what we had on the first page to obtain scaling two called the correlation by plot for the objects we modified the matrix f by multiplying it by the matrix of eigenvalues we had this matrix of eigenvalues here lambda capital lambda which as at the eigenvalue is nine and five and then zero and zero okay so here we have first to put this matrix to the exponent minus one half one half is the square root and minus is one over so lambda to the exponent point minus one half would be one over square root of nine one over square root of five and zero zero okay and so for the diagonal values we take the square root and then one over that because of the minus so this matrix is now used to multiply f in the scalar product and we obtain the position of the objects in the right hand graph and for the variables we use now lambda to the exponent one half which is simply the square root of nine which is three and the square root of five and this is what is used to multiply u here to obtain the position of the variables in the right hand graph and the people have finally agreed because these two representation follow a rule that says that with each pair f and u or g and u scaling two we have to be able to reconstruct the objects by multiplying f by u transpose we obtain y i should have at the centered here or centered y and if we multiply g by this transpose we also obtain y so we put together in one representation the two elements that allow us to reconstruct entirely the the original centered data this rule was proposed by a statistician called Ruben Gabriel and so these are the two basic representations of principal component analysis here in the end out you can look at that later i have another example where i added a third variable to show you that it works also with three variables and i would like to complete my yeah with three variables we have the the representations according following axis one and two and then axis one and three you know the projections of the variables and all that for scaling one and then for scaling two and the code to produce all these pictures is shown here you have all these four bi plots that are produced here you if you want to try it later and you are welcome to do that that is a nice small exercise okay data transformations i will leave that to danielle borkar after the coffee break and i would like to terminate by showing you simply two applications of of this of this method one where we need scaling type one the distance by plot and one where we need a scaling type two let's see examples scaling type one type two is a picture from the thesis of mac du frehn an entomologist from belgium who worked on the carabid carabids of belgium so we sampled the carabid species all over belgium and these are the sites that where we did the sampling and then on these two axes the account for i don't remember how many species but there were many species oh no sorry this is not from the species data this graph is from the chemistry of the soil oh yes that's right so we have a ph and then calcium magnesium potassium sodium and phosphorus here all these variables have been long transformed and actually he decided to plot the variables as small pictures separate from the main one because we would not have seen the variables too clearly there are too many points so this is always a possible choice instead of plotting the variables right down here you can plot them in a separate picture but for interpretation he also drew on the top of that ellipses corresponding to the types of environments where the is insect traps had been placed so mineral lands or peat bogs or other types of environment that levial planes and so on and he plotted the ellipses but this is not done by the principal component analysis what i want to show you is the relationship between that and that in this graph the variables that are pointing like here sodium phosphorus and so on they correspond the high values of these variables correspond to the points that are here so it means that for these two variables for instance low values are found in the points that are found there don't forget that these are infinite axes that go in the negative as well as the positive side and so low values high values in that direction we have higher ph higher calcium and so in this direction we have lower ph lower calcium and here he used scaling type one in this representation because he wanted to show us something about the points and then how they relate to the original variables but he did not try to represent the variables according to their correlations here it is only the projection of these one two three four five six variables in two dimension that projects them at the angles that seem to be not right angles but they are right angles in a space with six dimensions there are other cases where we want to represent the variables in a space according to their correlations this is an example from a paper of mind where we looked at species associations you and but i just want to show you this final picture of the paper and we'll not describe in detail the method of analysis of species associations here how can i make that a bit bigger 25 150 but perhaps yes that's good so here i and this is one of the data sets with which you are going to work it is one of the two data sets used in the orange book here and actually it is a data set collected by daniel barker himself they are or rebeated mites in at the 70 soil cores taken in the in the pit in the mat a pit mat of a pit bug and we will show you the sampling design during the the practicals the story here is that we i represented in this graph a principal component analysis of the species with scaling type 2 because we were looking for species associations and we want species that are strongly correlated to come together okay and this actually this picture represents 38 percent of the variants of these there are 35 species so having 38 percent of the variants in two dimension is quite good it is a lot high proportion of the variance and you see that the species that have small angles here they are strongly associated and they are opposite of course to species that are on the other side and when i looked at the species associations in this data set it is used as an example in this paper the species that are represented with squares they are members of one association the species represented with circles they are member of another association so that is a case where we want to use scaling type 2 with community composition data uh i think i may stop uh stop about here you will see more of that of course during the practicals this afternoon and we can go to coffee break okay and we start again in at the yeah 15 minutes good