 I take a couple of minutes now to build upon what Pialogen told you about regression. The first topic here, polynomial regression, is actually multiple regression. It's simply a form of multiple regression. A form that allows us to fit one or more explanatory variables to a response variable in a nonlinear way. So basically we are trying to be nonlinear with linear regression. And this will be extremely helpful in some cases as you will see. The principle is to add to, let's take the simple case where you have one response variable, y, and one explanatory variable, x. So polynomial regression consists in adding higher degree terms of this x-explanatory variable. So polynomial terms, x squared, x three, fourth power, and so on. And here we have an intercept term as an usual regression. Well, the story could stop here. That's simple. But there are some points to examine. It's not always so simple and there are some traps and pitfalls where you are not supposed to fall into. So we'll see this now. So depending on the degree of the polynomial, the second, third, and third you obtain a regression called regression of order two or three and so on. The idea of doing this is to be able to fold the straight line that is actually a model of first degree regression when you have a simple linear regression with one explanatory variable. Each new degree actually adds a fold to the model so it becomes more subtle. So let's take this example here with artificial data that I built on purpose for it. Clearly the first degree regression here simplifying the x variable to the response variable is absolutely unsuccessful. It barely takes up a little bit of a trend because those values here happen to be a little higher on average than those on the other side. But as you see, the adjusted r square, now you know what adjusted r square is. The adjusted r square is zero. It may be, well, this is one little point about the adjusted r square that consequently to the way it is computed. It may happen that you obtain negative values, but those are simply considered as zero. It's just depending on the exact mathematics of it, but it's not important. I mean, as long as you are below zero, consider simply that as zero value. And of course, if I test the relationship between x and y, I get something highly non-significant in this case. You know, probably know the code in r to run a simple linear regression. So lm for linear model responds depending on x. And this is the result when I plot it. If I add a second term, second-degree polynomial term, x squared, you see that the model begins to catch, to capture an appreciable part of the variation in y. Our adjusted r squared has increased to 0.23. The model is significant. And, well, you see that we have already something interesting here. Note here how I have coded this in r. I have not simply written x plus x squared, because if I did such a thing, r would consider this as an interaction in terms of an over. It would misinterpret the second-degree term. So you put it between brackets with this i, meaning take this as it is. It is an x squared and nothing else. So don't forget, if you want to run a polynomial regression directly by hand coding this in r, don't forget this i and the brackets here. So adding a third-order term allows us to bring a new fold into our model. And as you see, this further increases the fit here of the model to the data. A little bit, r squared is now 0.32, with p-value very low. So here we have indeed a highly significant model. And as you see, I have added the third polynomial term as before with the i and the x. And the third power here. Should I continue? I shall. I shall continue here up to the fourth degree here of x. And here, obviously, I've still gained something. Square is 9, almost 0.7. And of course the p-value gives a highly significant model. And then, should I still go up, that may become a problem. So at some point, we'll see that you have to put some stopping rules to this play. Because otherwise, your model, for one, becomes extremely cumbersome. It actually, it quickly goes out of hand because you cannot interpret the various terms. And then, of course, there lurks the risk of overfitting that Professor Liscardi mentioned yesterday. Overfitting is always a risk. It's a risk in statistics as well as in machine learning. So at this point, let us stop here. It looks like really, we have captured everything that may be interesting in this data. And perhaps even some more, but well, in any case, it looks good for now. So this is for the principle. The first problem with this simple way of doing it may be that polynomial terms of a variable, when you put, when you square an x variable, for instance, those terms are highly correlated. So, to one another. So here, for instance, if I construct an x variable here by asking for 100 values extracted from a uniform distribution with a range between 0 and 10. And I correlate this variable to its squared values. I get this, 96%. If you run this same piece of code, of course, you will obtain slightly different values because each of your runs will end up with different random values. But the general idea is this. Those two terms are extremely correlated. You don't like this in regression because when you have highly correlated explanatory variables, then the regression coefficients that are computed on those variables become unstable. What do we mean by unstable? It's simply that in a real situation where you have several... It's not only true for variable that are squared and so on, but in general cases it's a problem in multiple regression when you have a bunch of variables that are highly correlated among others, then the risk is that from one sample to the other in the same statistical population, you can get regression coefficients that are widely different from one sample to the other. This is the instability that is referred to here. It's a risk in general terms. I just mentioned it in this case because we are here speaking of polynomial regression, but it's a general risk. Again, this refers to the excellent advice of Michael Skardy of yesterday when he advised to resort to sound ecological thinking. Before throwing into your models any variables simply because it may be interesting, first think of those that may be most interesting and possibly check whether a couple of them do not tell approximately the same story, which may be the case for instance in vegetation science when you have different means of assessing nitrogen in soils, which is a difficult task. You may have different possibilities or phosphorus, total phosphorus or other forms, and then of course we have the different values, so just throw in the corresponding variables in the equations and no, this is not good because you run into this problem, but this is a little bit broader than the question here of polynomial regression, but in any case it's also lurking here. You have a couple of possibilities because when you have different variables like in my phosphorus case here, of course you can resort to thinking or possibly test one and the other and decide to retain only one, this is possible on the basis of thinking, but here we are stuck with the problem, we want to keep both, so we must find a means to avoid this correlation. So there is a first easy way of well alleviating the problem without eliminating it completely except in some really limited cases and this is simply to center the X variable before squaring it and put it in on third and so on and so forth, but at least for the second degree when you center the variable, then after that your second term will be almost uncorrelated to the first degree, at least as long as the variable is reasonably symmetrical in its distribution. When it's perfectly symmetrical in its distribution, then the second term is perfectly non-correlated with the first degree. As long as you have those usual mild asymmetries then you are not completely uncorrelated but at least it helps a lot and you can certainly work with such a case here. The other more general possibility which is fortunately offered in a very easy way in R is to use orthogonal polynomials. So mathematical literature offers a couple of possibilities to scale and well the principle begins also with the centering of the X variable but then you have various manipulations and computations so you obtain polynomials first, second, third and fourth degree in this case that are non-correlated to one another. As you see here I have asked here for a polynomial of fourth degree of my variable X and here I have correlated all those resulting terms to one another and what I obtain is of course here in the diagonal, it's the value of themselves correlated with themselves, it's a correlation matrix here and everything else is zero, it's related to the machine precision here when you are at minus 17, you are at zero. So this is the most elegant way to resort to polynomial regression and in terms of R squared you don't lose anything, it's equivalent to using the non-orthogonal polynomials, it works perfectly. And just to show you that it indeed produces the polynomial that you want, this one, I have used those four terms that I have obtained here and simply plotted them against X which is my original variable. So you see that the first degree has been somehow centered, you see that now instead of having from zero to ten, you have something that is centered around zero, it has been centered. And then the second degree is this and the third one is this and the fourth one is this. So we indeed have all the terms with all the folds that we need to adjust our data as well as possible. But now as Pierre suggested earlier, is it really the aim to get to the perfect nirvana of an R squared equal to one in our data? You know, when people are extremely proud and come to me, yes, I get this and that analysis and I have an R squared of 0.95, 98. Ooh, this gets me nervous. I'm practically certain that there is something wrong somewhere in the process and I think I... Well, I never found a real situation where something sound had been done and you get such a thing. Well, natural variation is such huge that it's simply not possible. So you probably heard in the direction of too many explanatory variables. Of course, the reason why with, in Pierre's example here, Petru's example, 100 points and you had 99 random variables and then you obtain an R squared equal to one. The simple geometrical reason is that you need 99 dimensions to position 100 points, one against the other, you know? To position two points, you need only one dimension, which gives you the distance between two points in an absolute, not as a reference of an external referential, but one in function of the other. You need one dimension for two points, you need two dimensions for three points, okay? You always can fit the plane in three points and so on and so forth. Hence the N minus one degrees of freedom you always start with in any data set. So if you have 100 points and 99 degrees of freedom in your model, well, it just amounts to a transformation of the data in terms of your new variables, but you have as many dimensions, you provide as many possibilities to fit your data as you have dimensions to start with. So you don't, as Pierre put it, you explain everything with nothing, okay? That's the reason. So we have to avoid this. Keep this in mind in general cases, not only here for polynomial regression. But now what is the use of polynomial regression in ecology? A remark, of course, a preliminary remark here, is that I present this now in the frame of multiple regression with one single response variable because this is a two-days topic, and of course as with the presentation made by Pierre earlier today, this is because we will need this in the multivariate frame from tomorrow on in RDA and in CCA. So basically we know that frequently species show unimodal responses to gradient, to an ecological gradient, to a constraint of maybe against temperature or calcium content or whatever, pH. So you have something like an optimum of the species where you will find most individuals per square meter or what your measure was, or grams per square meter if it comes to biomass. And then of course when you get farther away from this optimum, then you have less individuals because the conditions are more difficult. That's the basis of this unimodal response. Using a second-order polynomial term provides you that possibility to put that necessary fold in your model for that situation to capture this feature, this unimodal feature. So using a regression with a second-order term but still keeping the first order is a way to capture any linear trend. Linear trend in this context probably generally meaning that you have not captured the whole range of your species when along the trend, you cannot switch into the multivariate case. Of course when you sample communities, you will most likely sample part of the species in the whole range of their presence, hence the zeros outside this range on both sides, and a part of those species maybe at the left or at the right part of the range and you capture only part of this unimodal response. So if you are just a little bit at the edge, like in this example here, we have captured probably most of it at the left part but not quite here at the right part and of course there are still some variability here. So the linear part may still be useful to make the adjustment a little bit better to take this slight asymmetry into account. Of course here the second-degree term will capture the parabolic shape here of the response curve. So this is the basic reason why we could be interested in using polynomial regression in ecology. If we do it in a multivariate term, a case like tomorrow or the next day in RDA, then we could use that procedure to improve our modeling. So you can also verify, if you are working on one single species since we are in regression today, I'll now go to these cases where we have a single response variable. You can combine those. It's easy to plot scatter plots between your response variable and all the explanatory variables in turn. If you observe that for one given explanatory variable, you seem to have a linear response here. Maybe you are at one end of the gradient here and you have something that is maybe not quite linear but at least reasonably monotonic here. Then you can dispense with the second order. Of course it would be useless here. So in this case I would have here my model. I would put it with a linear relationship. While here in this case I would keep the second order term or put in at the second order term to have the unimodal relationship. So it's perfectly possible of course to combine linear model for one of the explanatory variable and unimodal one for second order for the other one. Okay, up to now. Now of course this is at least when you work on it. Maybe you have time to do so or you want to do so because you are very careful about one's particular species and you can afford the time to go in that direction. But another possibility also is to resort to some kind of variable selection. I'll come to that at another point in this course, variable selection. But the way of actually finding automatically which degree is the best fit for your model, for your explanatory variables would be at the outset to provide the model with the second terms. So here if I have a situation with my response variable here and three explanatory variables here I build a regression model with the second term for x1, x2 and x3. So this would be complete. Well, you could go even further and build interaction terms or a combination where you multiply the different terms. I'll not go into that for now. Let's simply consider each of the explanatory variables separately. But in any case this already gives you something more consistent but as we saw in the previous slide it's always necessary. So there are ways of selecting the variables that really contribute to the model here. And if you apply one of those variable selection procedures you may for instance end up with the elimination of those two terms which this is completely fictitious of course but it could happen that for one term you have only the squared value because you happen to have sampled your species in the complete range of this explanatory variable and so your unimodal response shape is completely symmetrical so you don't need the first degree. Here it could be the reverse meaning you have something monotonic and in the third variable you could have used both to obtain an optimal fit for your model. So at the end you have a thinned out model, a reduced model here where you have kept only those variables that are adequate that are significant. Of course there are tests associated with this. We'll see all this tomorrow with a model that has a much better fit than a model built only on the first degree variables. It's up to the point that even in RDA we recommend especially in early phases when you are at an exploratory stage of your study to use to add those second degree at least second degree polynomial terms to all your environmental variables to be able to eventually identify those situations where a second degree may be useful. So second degree is easy to interpret. As you saw it has an obvious role to play when you have a unimodal response. Third degree may still be useful because well it's an approximation, but if your response looks like this because you are slightly asymmetrical maybe well you may have the first degree that captures part of it but then a third degree may produce such a thing here. This is not fully appropriate of course but by no means you could extrapolate such a model out of the range where it has been fitted. But this is also a general rule in regression. You model your and maybe well outside regression as well when you calibrate a model when you determine its parameters within a certain range of your variables the model is valid inside this range but not outside and Pierre has an extremely funny example of this. You may have noticed in the material for today something about Miss America an undernourished model. What are these? Actually the idea here was precisely to give you some idea of what happens when you extrapolate a model because basically the ratio between height and weight of Miss America tended to decrease during the 20th century and if you fit a linear model here and go to 2016 your Miss America is really really undernourished as to have negative weights. Something like that. So it's the danger of extrapolation. You can always interpolate meaning you are within the range but don't extrapolate otherwise it becomes a very risky business. So well for my example here the third degree may be marginally useful but you can hardly go beyond that in ecology without well I mean of course as I've shown you you can always add up and fit a little bit more but it makes no point going beyond the degree that you can explain and if you want to go progressively from first to second to third degree to fourth degree that way it's always a possibility to test if the addition of a new term brings something significant to the R square. So the gain in R square is significant. You can test this. It's easy to do in R and in my script for this afternoon there is a procedure. I show you how to do this. It's also in Pierre's material actually that part I think I took from Pierre's material just adopted it a little bit with a small example where you add terms and then at some point it stops to be seen the addition stops to be significant but this is as always I prefer first resorting to sound ecological thinking and if you really don't have particular reasons to stop at some point then well you can resort to statistical testing to see where you are supposed or when you can stop or not. Another okay always you know if you have a question and simply raise the hand I'm very likely to not see the hand and this is because physiologically our sensory system is made to detect changes not constant values. This is why for instance frogs when they wait for their mosquito to pass by are completely they don't move at all because even the eyes don't move so when the mosquito goes here you have a sudden change and it reacts extremely happily and captures the mosquito of the dragonfly and we are built the same way constant signals like the pressure or friction of our clothes or well the smell in this room you don't feel this anymore because these are constants so if you make this with your hand you don't see the movement because this is a change okay Pierre briefly mentioned trans-surface analysis before trans-surface analysis is a particular use of polynomial regression where the basic explanatory variables are spatial coordinates in the simplest case if you are along the transect it is the first example that I showed you with the folds adding up up to the fourth degree so you could do this on a one-dimensional transect but here I am especially speaking of a two-dimensional well on an area okay you have a couple of some off-sites you have your sites on an area and you have their x, y coordinates so you have those two variables that you could use here as a first-order polynomial well it's not yet polynomial but in any case you have your binomial term here x and y and this using this is equivalent to fitting a plane across your data so maybe if your area your sampling area was this room the samples distributed maybe randomly don't need to be systematic of course randomly across this room and your response variable happens to be have larger values there and smaller there for instance then fitting a plane would give something that goes up in this direction which is a combination of x and y so this is a simple way a very crude way I agree but a very useful way in some circumstances as I will mention later but then you could envision at this at the early stages of our thinking about capturing and modeling spatial structures in ecological data we went along this line order two, order three here and this was actually at the basis of our paper in 1992 about partition and variation partition in multivariate case where we had one explanatory matrix made with environmental variables and the other one made of coordinates and their second and third degree third order form here so this was what we had at that time so of course things have now drastically changed we have other possibilities as you will see in the two last days of this course but nevertheless you have that possibility and adding the second order here also adds a fold in both directions you see you can have a fold in this one and another one in that in the other direction and since you have the possibility here of even multiplying both usual polynomial construction then again you have the possibility of for instance a uniform bump here at the middle of or somewhere in the room or a saddle shaped so maybe this way and then that way in the other direction so this gives you one more fold and the third order one more fold again in both directions but as you see it becomes rapidly extremely cumbersome for a third degree polynomial you have nine terms here not counting the nine terms of course you can resort to variable selection and as I explained earlier you can try to select those that are significant but in any case you have a large number of terms quickly and even for the third degree it still is an extremely crude way of modelling very broad-scale spatial structures so we shall not go any further here because much more powerful methods exist to analyse spatial patterns and we will come to them yes Cosimo Urban myth Urban myth Urban myth ok I am extremely happy that you remind me because I thought of that one hour ago but forgot to mention it ok the requirement about normality is about the residuals of your model the residuals of the model one way to obtain this normality of residual may be to normalise the response but there is absolutely no reason to do anything about the explanatory variables and the proof being in the pudding Pierre mentioned earlier that a regression a multiple regression could be constructed with dummy 01 binary variables which are of course absolutely not normal but take the extremely simple case where you have one response variable and one dummy variable simply well separating actually separating your data set into two groups this actually is equivalent to the famous if you run it classically with a simple regression that's equivalent of running a key test for two groups for the means so what are the residuals if you have two groups say I'll draw them separately here on my on this you have two groups of data with the same response variable here but one group has a mean here observed mean and the other one has an observed mean okay the requirement for normality is for the residuals why? because here what are the residuals the residuals are the deviations from the respective means so running your regression amounts to superimpose here actually it's like centering your two groups upon the respective means so both become zero and you have something like this and the model being here this is the common mean actually and of course the test is whether this here is significantly significantly different from zero but be it or not the residuals here are simply the superimposition of those two curves here and if this is normally distributed then you are in business but of course my zero one variable is absolutely not normal it does not have to be so this is an extreme case but just to show you okay so it's a well it was a I think it's a thing we should repeat as a mantra in all courses variables don't have to be normal, it's the residuals and that holds for an over that holds for multiple regression that holds for anything when you call for normality concerns actually conserve the residuals and this is of course in the purpose of this is another point that I can add this is for a purpose of a parametric test so if you use, if you want to use the t distribution students t distribution to test your different between means or your model of regression then this normality is a requirement if like we will do in RDA and CCA if you test by permutations then you don't even have to hold to this requirement of normality of residuals because permutations will take care of that by generating their own reference distribution some other requirements but then no I go not further into this but you'll see that it doesn't mean that permutation test means it's a free for all you have to be careful about some other points because when permutations test were invented many people thought yes I can do anything I want since I generate my own reference distribution who cares no it's not okay in ANOVA or in comparison of two groups for instance you still have to verify that the two variances are homogeneous or two or more k variances for ANOVA are homogeneous and this is the reason why I did not build this example something like this which could not even be submitted to a permutation test it wouldn't work you would have you would go into the how is it called the Berenz Fischer problem yes thank you Berenz Fischer problem you still go into that problem but in any case well we are out of the subject here but thank you for having asked this question because it's extremely important that everybody understand this okay normality is for residuals paint it on your walls above your bed I don't remember it as I told you before one useful still now extremely useful use of what polynomial between brackets actually regression in ecology is the trending in we'll see in the last days of this course that we have to identify spatial structure but actually not only in most cases those spatial structures that interest us because they are most difficult to bring out are those that are at intermediate spatial scales not the one that cover the whole surface which be those ones maybe general trend because there is something larger process is larger than your sampling area that produces a trend within your area biographical trends for instance could produce such an effect they are beyond they act at scale that are beyond your sampling scale but never or your area but nevertheless they produce a trend within your area now the methods most methods in our identifying spatial structures including the ones we develop to identify those structures at all scales work better or work at all if a condition called stationarity is respected satisfied now to be very simple about that that stationarity basically means that at every point on your surface the differences between close values if you take two close values there and there and there and there and there and you average those values plus minus that must be zero and you must have finite variance over the area so that you don't have something widely different in different areas so basically you can obtain something very close to that and satisfy conditions for most of the tests we are running or they are running in geographical geology for instance in those different tests for spatial structures when you remove those broad scale trends from the data before running those those tests for other reasons and now specific to what we have developed in terms of DBMEMs and MEMs that we will address later is that yes those methods are able to capture broad scale trends but they will waste many variables because those variables we will create are basically sine waves or something like this so it's extremely cumbersome and actually useless to try to model something linear with a bunch of sine waves ok it works we tried it but then you use half of those you have already created just to model this and this is of course a linear trend is easily removed from the data beforehand so this is another use of course it's a first order polynomial but I included this notion here because it just fitted in the general pattern here so this is as I told you before equivalent to do this to do in this you you fit a model you fit first order x y and z being your response variable and you use your x y coordinates as explanatory variables nothing else test if it is significant and sometimes we test it in other cases we simply use it and decide that we want everything flat and so what you retain are the residuals which are here represented by those vertical lines above and below the plane that I have fitted on my surface ok this all this may already be known by many of you collectively I'm quite sure that everybody knows a little bit of everything that I have now yesterday last night I decided to include a couple of more slides about another use of polynomial regression that has been presented by Kajot Erbrach in a book that he published with co-authors in 95 which was one of our reference books at the early ages of canonical ordination and this is a use of how to be able to run a Gaussian regression so a regression fitting a normal curve on species abundance data you know those kind of data with a unimodal response actually the two model is not the course second degree polynomial that I showed you because it would make negative abundances at both ends of the distribution the real model here if the distribution is symmetric around the optimum here would be a normal curve like this you can do this easily to do that you have first to transform your response data, your abundances into logs because fitting a second degree polynomial function on log transform data is equivalent and then you can go from one to the other to fitting a normal model, a normal curve to the corresponding raw data so this is how you do it let's show you here some words in French because I borrowed this last minute yesterday from another course I'm giving so you have here a fictitious species for the French speaking people I've called it bidonia because it's a species that doesn't exist well so you have the distribution here a fake species actually so you have here a fictitious series of sites here along a wetness gradient or water content, soil water content gradient here where I've captured just about the whole gradient so these are the true values here you see the raw abundances the first step is to well you transform the data to well could be neperian logs usually it's what we use of y plus 1 this transformation is so often used in ecology that in R, in the standard R installation you can so yp for instance yprime could be log 1p log 1p does exactly that so you don't have to write log y plus 1 if you write log 1p it does the same job automatically just the information here so you have now the logs and you fit your second degree polynomial equation to these log transform data now we will work with the B0, B1, B2 terms well actually B1 and B2 B0 as well for the last stage you take those and now with B1 and B2 you can compute the value of the optimum of your species meaning along this x axis this value here is the U and you obtain this from those regression coefficients you can also compute what in ecology we would call the tolerance of that species around this optimum tolerance being here defined of course it's one definition you couldn't multiply it by two if you are very generous tolerance being defined as one standard deviation of the two sides here so this is again obtained but this time only with the B2 coefficients you don't need the other one so that second important feature of your species distribution can be obtained in this simple way and the third and last one is the abundance at the optimum so here tolerance here is the abundance at the optimum it's that easy I did not take time last night to write a little r function that would do this automatically for you but I suggest you for those interested in writing r functions to do it as an exercise and finally well here I have only the model not the data I didn't put this back into this representation here those terms and of course to draw the curve itself you just insert those parameters that we just computed the C U and T you just put them into the the normal probability probability density function here when I read this the first time nice how elegant now of course as I told you already polynomial regression has its limits so avoid the use of I'm just a little bit summarizing what I have told you up to now avoid the use of higher order polynomials if you cannot explain them ecologically so second term okay second degree third degree to some extent further than that it becomes a little bit strange because I'm not sure whether you have really clear explanation of that and if you prefer using an objective criterion then as I told you also add degrees but test the added explained variance at each step and stop if nothing significant is added in terms of r squared and beware of overfitting so the situation that I explained with Pedro's example I took up Pedro's example that Pierre mentioned earlier so you don't explain beware to explain everything with nothing