 And the first thing that I'll look at will be regression objectives. I'll go quickly through that. These are things that I'm sure you know, but it is just to recall that multiple regression, actually why I talk about regression is because it will be the first element of calculation that we will do tomorrow in canonical analysis. The first thing that we'll compute will be multiple regression for all the species in the response matrix. So in multiple regression, we can use it for description. And the aim then is to find the best functional relationship among the variables in the model. In multiple regression, we have one response variable and a lot of explanatory variables. And in my notation, the response variable is called small y and the explanatory variables, they are called x. And it can be, for instance, environmental variables. Tomorrow, we will do the same thing, but with having a whole matrix of y variables like this one. But today, we stick to a single response variable. So that was the first thing that we can do. We may want to find the best equations and then use the parameters, the regression coefficients to describe the effect of the different variables. We may want to use it for inference that means generalizing the results of a set of observations to a target population, the reference population for which the data are a sample. And if it is a representative sample, then we can generalize our results to the target population and test some ecological hypothesis about the target population. It may simply concern the existence of a relationship. Do we have an R square, for instance, different from 0 or for a coefficient, is the slope of that coefficient different from 0. In other instance, we can be concerned with the sign of the relationship and we can also compute confidence intervals instead of doing test of significance. That's another way of testing and so on. Sometimes the ecological hypothesis that we are entertaining specifies specific values of the parameters and that's fine, we can test them in this framework. And the last objective is forecasting or prediction which consists in calculating values of the response variable for points where we don't have an observed value. So we may have observed value for 100 points and in point number 101 we don't have an observed value. We can compute a regression model from the 100 points and then use that model and apply it to point number 101 in order to estimate a value. That's a way of doing imputation in the case of missing values and so on. Professor Scaldi mentioned this in his talk yesterday. So these are the main objectives. But tomorrow what we will do will be to use multiple regression essentially to compute the fitted values. My next point is to mention that we can compute the regression parameters using matrix algebra. I love algebra and in discussions with people during the icebreaker yesterday I realized that at least some of you like the algebra also. So I'm going to talk a little bit about algebra. You will see it is not as bad as it sounds. So this is a typical data set where we have a response variable and a bunch of explanatory variables here. So when we compute our regression equation we want to estimate the b coefficients that when multiplying the x variable will reproduce y as well as possible within the framework of least squares estimation. So usually we want to estimate coefficients b1 to bm for the m variable. And also we want to estimate the coefficient b0 which is the intercept. As Professor Scaldi mentioned yesterday we are often interested in an intercept. And this is done, we can estimate the intercept by adding a column of 1s to the table with the m variable. So it is another variable that we add here and it is a column of 1 and that is what allows us to estimate the intercept except in the rare cases where we want to force the origin to go through 0 and in that case we don't include that column. When you do a multiple regression using the lm function of r you don't have to include this column of 1 by and it is included automatically by the lm function. So you don't do it yourself, it's included by lm. Now, when we have that how can we solve this problem? We know y, we know x but we don't know b. And so we have to isolate b in some way in order to compute something using x and y. But if it was simple algebra we would divide y by x but division does not exist in matrix algebra for good reasons. There's not some imposition by mathematicians. Conceptually it cannot be done. Okay, so what will we do instead? We will use mathematics that implements the principle of these squares. So this is what we want to do. In order, instead of dividing y by x we can multiply y by the inverse of x just like in ordinary algebra if we have 5 divided by 2 it is the same thing as 5 multiplied by 2 to the exponent minus 1 or 5 multiplied by 1 half if you like. So that's fine. But in order to inverse x to obtain that for an x matrix that matrix has to be square. Oh, another constraint. So we will fabricate a square matrix. As follows, we will pre-multiply by the transpose of x. We take the x matrix, transpose it, turn it like that and pre-multiply here and there. So if this is true then this is still true because we have pre-multiply by the same quantity. Now this is a square matrix with the dimension of the number of columns. If we have m columns in x then x prime x has the size m by m. And this can be inverted because it is a square matrix. So we do that now. We invert this and pre-multiply it here and there. You see, this is the same bit as that. This is the same as that. So we simply add that here and there. And like in ordinary algebra, if you have a quantity multiplied by its inverse these two cancel out and they become one or in this case an identity matrix. So it disappears from the equation and on the left we are left with this and we can isolate b. And this is how we solve the multiple regression problem and it can compute all the b coefficients with one line of matrix algebra. Isn't that beautiful? I think this is great. And this is what we will be using tomorrow as the basis for canonical analysis. That's why it's important that I show you that today. Now of course we could also want to calculate the y-hat, that is the fitted value. That will be produced by taking the original x matrix and multiplying it by these b coefficients that are estimates of the original b. And if we replace b by this, we have x multiplied by b. And so it gives us this where we can compute y-hat directly from again a single line of matrix algebra. And actually this is the equation today in canonical analysis where small y will be replaced by large y and we will obtain on this side whole matrix of fitted values with one line of algebra, one line of r-code. Okay, so I hope I have convinced you that it is at least possible and this is something that you can try tonight in your room if you want to see how regression can be computed. It is really that simple. Now I want to talk about r-square and this sort of things. And this is in my chapter 10. There are a few pages from my chapter 10. Here it is. So I'll put it a bit bigger. You have this document among the documents that I put on the web page. A bit bigger again. So the first thing I want to show you is the r-square. And then we will talk about adjusted r-square. r-square is the basic statistic that comes out of a regression equation. Actually I remember a paper where they were showing 12 different ways of calculating r-square, all leading to the same result. When you have a regression equation then from the regression equation with lm you can produce then the fitted values y-hat using, for instance, the line of algebra that I showed before. So these are the fitted values. And the r-square will be the sum of squares of the y-hat divided by the total sum of squares that you had here. So r-square is sum of squares of the fitted values divided by the sum of squares of the original data. The sum of squares is simply you center the value that is, subtract the mean square the values and sum them. That's all. And this is equal to if you were doing sum of squares of y-hat divided by n-1 here and sum of squares of y divided by n-1 and this would be the variance of y-hat divided by the variance of y. So we don't really need these n-1s. They cancel out and it is more simple to write the equation in this way. But if you compute it with r maybe you want to compute the variance of y-hat at the top and the variance of y at the bottom it will produce the exact same r-square. Okay. So different ways of calculating and essentially if the y-hats are exactly equal to the y then your equation is as a perfect success and you predict the values perfectly. But otherwise in all the real cases the variance of the y-hat is smaller than the variance of the y because in the regression equation your points are not exactly on the regression line if you have your regression line your points will be all around it and there are residuals left so that the variance of the y the original y is here is larger than the variance of the y-hat. So you obtain a coefficient that is between 0 and 1 1 meaning total success 0 meaning total failure. Okay. Now this seems to be a very good coefficient but, but, but, but we also know from simple simulations that this is not the case and here I'm going to show you simulation results in a paper by Pedro Perez Neto who has made strong contribution and a guy named Daniel Barca and Stéphane Deray and myself but all the simulation work was done by Pedro who was then a postdoc in my lab and I want to show you one picture of that paper on page 2617 this the paper is available on our web page. This is the picture. We will look at the left panel here for this study Pedro generated first two variables in columns with 100,000 lines and he fiddled with his data so that there was a known correlation between the two data sets of 0.20 something or 0.60 something 0.61 in one case it was 0.608 in the other case 0.201 so then he divided his 100,000 data into slices of 100 points he had a thousand slices and for each slice he computed a regression equation and found an estimated R squared, estimated by the regression from a sample of the whole data set and repeated that a thousand times and he found that in the case of the points with 0.61 as real R squared he found this value now he started adding columns of random variables this is regenerated in R with the function R norm random normal deviates you ask for 100 R norm and you put it here then you add two such columns 2, 3, 4, up to 20 such columns so 0 is the real values for the real pair of data and then he adds columns into the X matrix here so he adds essentially no information there are random numbers and look at the R square it increases for 100 points if he has a total of 99 explanatory variables so the real one plus 98 of these phonate random variables he will reach an R square of 1 that is perfect fit that's what we should do in ecology we should add random variables and have high R square that would be nice to publish in our papers so we would have explained everything with nothing you see this is the danger of the R square it overestimates the true relationship between Y and X and this is demonstrated with these simulations and same thing here with the simulations starting with a true value of 0.2 20% so the R square increases like that to reach 1 at the end now back in the year 1930 Ezekiel not the prophet but the statistician Ezekiel then described something well known as the adjusted R square this is where it comes into the picture the adjusted R square is produced like this you take your real R square coefficient of determination take 1 minus that so the maximum minus that this is called the coefficient of non-determination in causal modeling now you modify it multiplying by total number of degrees of freedom N minus 1 and residual degrees of freedom minus 1 minus number of explanatory variable and when you have modified it like that you turn it back into R square by taking 1 minus the result and the adjusted R square are also shown in the simulations of Pedro here as these lines here and actually this is the mean over a thousand simulations and we see that the adjusted R square is very stable so it does not respond to the number of non-explanatory variables so that's very nice for us this is a way of removing the effect of variables that are not pertinent in our analysis so Daniel will describe later ways of eliminating these variables by variable selection but for a first approximation especially in canonical analysis we can simply leave all the variables there and compute the adjusted R square and the adjusted R square will be the basis for variation partitioning that Daniel will describe before that I'm almost done a few allow me to adjust a few more minutes so that was the story about the adjusted R square in this slide oh yes, and here in this simulation actually the paper by Pedro was about the use of the adjusted R square as the basis for variation partitioning that Daniel will explain and in this paper he showed after these basic simulations that adjusted R square was the quantity to be used for variation partitioning because it is an unbiased estimate of the explained variation that's the story you also find AIC the Akaike information criterion that belongs to the Bayesian thinking in statistics we use it a lot for variable selection and when you will be using the function rd step I think in the practicals this afternoon perhaps rd step will be based on AIC not this form but this form here at the bottom which is the corrected form corrected for small sample okay so AIC is very useful and the best predictive model is the one with the lowest value of AIC so it is not at all like an R square where we look for the highest R square here we will look for the lowest AIC and that produces that indicates the model when you are comparing different models the one that has the best predictive power another important thing is the F statistic but I will come back to the F statistic in multiple regression and we will use it again tomorrow in canonical analysis the F statistic is constructed from the R square and it is simply in regression and in canonical analysis it is simply R square divided by 1 minus R square and here we divide by the number of explanatory variables and here we divide by residual degrees of freedom that are like this so this is the basis for testing a regression coefficient and it will be also the basis tomorrow for testing the relationship in canonical analysis exactly the same R square okay I think that does it for this handout and the last thing is that I want to mention in multiple regression is that I want you to remember you probably have been told that but sometimes we don't remember that of course in the matrix X we can put quantitative variables but we can also put other things and sometimes people forget about that we get the impression that multiple regression is meant only for quantitative variables no we can put binary variables we can put factors because if you give factors to the LM function it will recode them as a series of binary variables as Daniela has shown you yesterday it will be done automatically now when it comes to geographic analysis we can put different types of geographic information we can use the latitude and longitude information from the sampling points we can use that as X and Y here in the regression equation but this is not very efficient a long time ago we tried to develop the X and Y geographic coordinates this is still useful in some cases to model broad scale spatial structures but it is not feasible to model fine scale spatial structures in that way now sometimes our geographic information comes in the form of regions we know that these points come from this region these points from that other region regions can be represented by a factor and they represent geographic information and finally we will see here I say in chapter 14 but on Friday we will describe these methods that are the spatial Eigen function derived from the geographic coordinates of the point but that are much more efficient than the polynomial to model fine scale geographic information and we will use them as explanatory variables in regression or canonical analysis so you see how things of this course will link to this idea of representing geographic information in the X matrix so I stop here coffee for everyone