 This is a very strange and nebula that has been created by the explosion of the star producing Hydrogen which is in red and oxygen which is in blue. It took me about 18 hours of Integration with my camera to capture it Yes over a couple of nights. I don't remember how many nights I had to devote to this But this is another story altogether Okay, so without Wasting too much time. I'll take over from here by focusing on one of those particular applications of RDA which is the possibility of running Actually the equivalent of a man over of a multivariate and over as you may as you may know Multivariate and over the classical one the parametric one is plagued by a lot of constraints conditions of applications are quite strange ends you have to Have the data that are multi variate normal You have that homogeneity of variances. Well, that one has to be respected for for RDA as well but also you have a stringent conditions about the number of Observations versus the number of variables and so on and so forth number of groups so this makes it very difficult to apply in For ecological data Whereas RDA offers an elegant alternative to this and Offer the additional possibility of the permutation tests, which are of course a great improvement and also the tri-plot representation of results, which a man over a classical man over does not offer at all so Conceptually to do this You have to code your factors in a special way during the first day. I showed you that recoding of qualitative variables in two dummy variables that were a binary zero one Okay, and then I told you that okay, and you all today again You saw that actually you need one less of those dummy variables and there are numbers of groups now This is not fully satisfactory for another purposes for one reason because actually if you those dummy variables are not Offer canal to one another. They are if you well if you want to obtain the Something to model the interaction. It does not behave properly. Anyway, so We have now to look for another way of coding those factors The variables must represent, of course the experimental design exactly The variables are of course just one one word all this will work if you have a balanced design Balance design is the nirvana of the ANOVA. I already told you that you got into trouble when you don't have any balance design One of the problems lying with that damned B fraction that appears when it's not the case and that contaminates The estimation of residuals and the test of residuals. Okay, so this is this being said Though our new variables are supposed to be orthogonal to one another the interaction when it's present when you have to code for it Should also be properly coded as also going out to the main factors and the number of variables needed to code each factor in the interaction Must be equal to their respective degrees of freedom. So something That can be done fortunately by what is called hermit contrasts I First show you a little example of those and then I proceed to show you how they are computed in R I've noticed that maybe Can be useful for you to have the R implementation here So you can directly go to it when you are at the computer So my first example is here with a factor of a with three levels and factor B with two levels and I have a the minimum replication within each cell being K or R equal to in this case. So a total of 12 observations Just the minimum required to test for interaction, you know that you have to have replication within the cells to test for interaction To have enough degrees of freedom So how do we code this with the whole helmet contrast this small example? It looks like this Remember factor A has three Has three groups three levels. So we have Two variables This is okay And look at this the first one provides the contrast between the first group the first four objects which belong to the first group and The remaining objects those helmet contrasts with an orthogonal balance design have the property of Have columns that have zero sum and now to get this first contrast you do it this way You put twos here and minus ones here So if you sum up four times two you have eight and eight times minus one you have minus eight and you sum up to zero The second of those two variables Is zero for this we have already dealt about this contrast We don't need any more information about it and it contrasts the two other groups by ones and minus ones in equal numbers So again, we have a sum zero a zero sum Okay This is the way they work As long as you have a manageable number of groups It's easy to build them by hand But of course it's easier to do it with our since now we have this possibility For the other factor, it's exactly the same except that we have only two groups and Thus we need one variable There's one degree of freedom for these two groups for this factor and I have put them in the same order so of course the alternate here between group one and group two of factor B and For only two groups you have ones and minus ones so that globally again it sums up to zero and now the interaction As you may know the interaction is actually the product of the two factors so The logic here is to make the product of those variables So you have first variable here multiplied so the first variable of factor are a Multiplied by the only one of factor B So two times one gives two two times minus one gives minus two and so on down here if you sum this you again will get zero and The other one is of course the second column here multiplied by B So you have the zeros here you have the ones here and you have the minus ones here and minus one times minus one gives one here so remember number of degrees of Freedom for the interaction is number of levels minus one times a minor of levels Minus one for each of the groups or the product of actually These numbers so here two times one gives two variables coding for the interaction so everything is okay, we have our number of variables and If you Now would put this into an R object and call Matrix computer matrix of correlation core your Object here you would obtain zero everywhere. So these are Orthogonal uncorrelated to one another we have fulfilled our Needs Well this warning Pierre already told about it and I already mentioned it before Testing by permutation, which is what we do in RDA does not alleviate the requirement of homogeneity within group dispersions in multivariate ANOVA and in by RDA There are tests for univariate situations like the Barthlet test for instance Barthlet test which by the way requires a normality unless You use about for a permutational form of Barthlet test and then you can test for homogeneity of variances without having normality if you are interested Give me a word and I hand you a permutational version of Barthlet's test that I have made up Just for this purpose, but this is for univariate cases in multivariate cases fortunately We have a function called beta disper in vegan based upon code written by Martin D'Assan and which Computes a test of multivariate homogeneity of within group dispersions meaning within group variances to be short Okay, this is for the principle I have made up an example of to be more precise I have extracted the example from the orange book here to show you how it works in practice Our do fish data the do being a river that runs along for part of it It runs along the border between Swiss Switzerland and France the northern part of the Jura mountains and Well, there's it's a very well known Data set this do a fish data collected in the 70s. I think anyway It's explained in the book. So here I took 27 of those sites. I made up an artificial case actually I had to make this up because I had no way of Extracting a balance design with all the 29 sides. Okay, 29 sides cannot be divided by anything. Okay, it's Monday on premier in English Primary number. Okay. So of course, it's not very comfortable So what did I fall this purpose only for the example? I took 27 of them. This can be divided Comfortably and I created a fictitious balanced two-way and over design. I took Altitude which is a continuous variable in the data set and I simply split it into three zones of Equal numbers three levels of nine of nine sites each Okay, so this this went well. I mean I had not to cheat in any case. I just simplified the information into three and three groups Where I get really dirty was with pH because there I Checked every single other variable that I had in the data set to try to play the same kind of trick Without deforming the the the data too much and I really not I know I did not really succeed so I took the one with with which I could make up Three level factor with the less possible deformation and that was pH So I also created a pH factor that grossly this time mimics the pH value and Put they put it into three categories now How did I do this in practice after that? in our I first created my factor altitude three levels nine sites each in a row of course from the top To the bottom of the of the river well actually the highest to the lowest point where the Samplings had been done so this can be done with a little function in our called GL GL you give the number of Of groups a number of levels and the number of replicates and you can put labels if you want otherwise, it will put you labels one two three, which is a dangerous thing to do Confuse this with a Continuous factor and then of course the dirty trick I had to play with pH dot fact because pH was fluctuating in different way along the river so I had to to put those limits where I could at best is possible and While keeping the requirement that I had I had to have nine Times the first level nine times the second and nine times the third limit So this is completely artificial. Of course, you wouldn't do this in any real case It's really for the example, but if you have to construct a two-level level design and of course you have your your your sites in a given order and It's easy to construct the first one if you have a second one then you will have to interspersed The levels within the level of the first one if you want to verify if you have not made mistakes You can always call for a table Crossing the two factors which I did here to verify if there was indeed if there were indeed three replicates within each of the cells and as you can see it is the case so I have thus verified that my design is balanced with my So that that we have effectively three observations of high altitude and pH value lowest value one and the rest as well now to the real part of The ANOVA or the process leading to the ANOVA I had now to create helmet contrast for these two factors So the construction I so I showed you before with my simple example It will be possible to do it by hand. It's still manageable as I told you it So but you are prone to make mistakes and it's long it's cumbersome and it's useless Because we have this function called model dot matrix which has a lot of possibilities and In this case we are we are using it Based on the two factors that I have created altitude dot FAC and pH dot FAC here The tilde yes Yes, so it begins here with the tilde you use the tilde and As you see I put it in red. I Have a multiplicative sign here. Why? Because of the interaction. It's short for Alt dot FAC plus pH dot FAC plus alt dot FAC multiplied are Allows us to go short and when you when you multiply it does not only compute for The terms for the interaction, but for the main factors as well and then I ask for contrast since I have two factors here, I have to to give a list of Types of contrast in each case here. It will be helmet so alt dot FAC has it has to been to be coded as contrast that Helmet and pH also is helmet contrast Okay, so we close our brackets here and There is still this minus the first column appearing here at the end because our function model dot matrix here Creates a first column of one in case you needed an intercept in some uses So this is fully useless in our case. So I remove it at the outset. So we get rid of it immediately well at this point I had called them at dot FAC and pH dot FAC which made for very cumbersome column names afterwards. So Yesterday, I just corrected this but I did not change this in particular, but here I just this I Just display using function head. I just display all the variables here and Variable names and the six first Value that I have here So as you can see We have I remind remember remember now you have three levels for each of the factors So we expect two variables for the main effect, which is the case here Now it begins with the minus one So the two are completely at the bottom in this case, but this is unimportant for altitude one and two here Six first about among the 27 so we don't see any day all the details and then come the interaction terms So here again, we have two and two so we have to get the Multiplications for every possible case so here by here here by here here and here here So everything has to be done So as to get the interaction completely coded and with the proper numbers of degrees of freedom Which is two by two equal four. So you have the four columns here. So You see here for instance for the the first term of the interaction You have those minus one by minus one given one for the second term the third and the fourth term It will be the same. So here for instance minus one Multiplied by wine is minus one minus one minus one one and here again for the last of those terms And of course, let's go down up to down to number 27 so This example is Fully developed in the practicum today's practicals. So you you can go further I can already tell you but here again, you have the details in the practicals the within group dispersions here are Homogeneous for this example. I was fortunate. Nothing was granted at the outset So the first what do you test first in an ANOVA? In a replicated ANOVA, I should say the first thing you have to test After of course the dispersions You are you now you are in the ANOVA. What is of course when you compute an ANOVA You have all results at the time. What is the first one the single first one you have to look at? Thanks for your courage, but it's the interaction Because if you have interaction you cannot interpret the terms the tests for the main factors and this goes back to that Famous thing that I have promised you about the interpretation of interaction and we still not have had time enough to show I have the slides ready for you any time, but of course now a time is running short for this morning again Anyway, I don't give up the hope of showing you this before the course ends So the first thing you have to test is interaction This is tested by permutation You cannot put any constraint by testing the interactions. Otherwise Well for different reasons, I don't have to time to explain it But you you would have to put if you would put constraints if you would be constraints for the first and for the second for the second Main factor and you would have you would permute within the cells and of course this is not possible So you test the interaction with unconstrained permutation, but this is a partial test Pierre told about partial RDA This is exactly the case see you have our species highly Hellinger transform by transform by the way and Remind remember that in our matrix of helmet contrast Contrast one two are for the first factor three four for the second and then five to eight are the interaction So here my constraining matrix. So the matrix X here is that of interaction? so the terms columns five to eight and I Put the other ones as co-variable. It's called Z Z in the W here in in the gun So the co-variables are the terms the the helmet contrast corresponding to the two factors the main effects here and I have this probability here of 0.975 Fortunately the interaction is not significant in this case We are always happy because if we aim to explain the main factors or to interpret the main factors If interaction is significant you have to make a whole lot more and separate analysis You have to analyze analyze factor one only for the first level Factor a only for the first level of factor b and for the second and for the third separately and then The reverse for the other factor So we are happy here when interaction is not significant in some other cases You are looking for interaction because it would be the signature of some particular process. We'll see this In our examples of space-time interaction test that we will show you later But now for for now we are happy. So test the main factor altitude here again, let's see the design of the The idea itself. It's an idea Where you have of course our response data here. We have our first Factor so pH in this case. It's pH No, no, no, sorry. It's altitude of pH. This is the whole matrix so number one and two is altitude actually so we test for altitude here and what we put as A Covariable so we want to remove their effect is the effect of the second factor pH and the interaction You don't want to have them mixed up But here we have one supplementary constraint and this is a constraint upon the permutations themselves Permutation constrained within the levels of factor pH This means that to get your proper appropriate probability You don't want to have any effect any possible effect of pH if pH is significant that means that for each level of pH You you will have different different values in the response variables You don't want to have this mixed up with your test of altitude so what you are doing is Permuting your data for all altitude so you mix up the data Corresponding to the different altitude because this is your age zero here hypothesis You're not hypothesis is that of no effect of altitude But you do is you do this separately within the three levels of pH. So pH doesn't be mixed up during the permutations So you mix up the altitude within the first level of pH you mix up altitude or Actually, what we do is mix up the corresponding response values, but within those those pieces and the same for the third level so in that way, you actually don't contaminate the permutation test with a possible effect of pH and Here as you see as you see altitude is highly significant for I'll go faster now that I have Explained the principle for pH is the same but of course in reverse order. So we have Variable three and four which code for pH main effect, which are the the constraining variables and all the rest so one two and five two eight main factor altitude and interaction These are in the Covariable part of the analysis and here contrary to to well actually the same principle as before we have to Permute our values so that pH gets mixed up but altitude within each slice of altitude because we don't want the permutation test to be Contaminated by the levels of this factor, okay? And here we have no nothing significant as you can see so only factor altitude is significant It's very comfortable in this case because it allows me to show you also One of those bonuses that I explained is the other one the first one being the permutation and that possibility to run the permutation in Restricted ways. This is a huge world within this these possibilities here somebody pushing against Trigger for the light I don't understand what's happening here. Well Why maybe I create the light? I don't know So we can draw a triplot a triplot of the I use the trick here Pierre showed you that you could produce two different outputs in the RDA for the site scores One being the model The fitted values so the exact output of the RDA itself so the come in the linear combination of Explanatory variable that generate a prediction about the the site the particular sites, okay? If you have an ANOVA design like this for altitude with three levels This type of scores will pile up all scores from altitude one at the same point on the other ordination Because this is the prediction this site is belonging to altitude number one Okay Same for two same for three the other one the the other type of Scores is based on the original response data. So You apply instead of using the fitted sites cause the fitted values of the regressions You use the original ones and you multiply by matrix you this preserves The variability among the site that is actually not explained by the RDA This is why we don't recommend in general terms to use these This course this course which are simply called sites called the first one that appear in in vegan But in this particular case here, it is useful because it will show you the dispersion of the sites around the predicted positions so you have here The altitude the low altitude with the corresponding sites so that those downstream mid-altitude and high altitude and those are the fitted sites cause Side scores which are linear combinations of environmental variables for those who have used the Kanoko and well fitted modern sites code and These one these ones around them are the sites cause Based upon the original Response data, so you see it gives you an interesting view because you have both the location of the Centroids and the predicted positions and the dispersion among these Centroids and of course on top of that since we are dealing with an RDA triplot You have as well the arrows corresponding to the species so you have obviously were a couple of them that are To be found in the mid ranges of the river those this one especially and in the lower Parts and these in the upper part of the river. So we have a rich interpretation something that is richer then you could have simply with With a man over multivariate an over traditional one that you could simply not apply to that kind of data anyway so okay, I'll finish there and At 2 o'clock and like clear I'll I'll start at 2 o'clock zero zero In this room for a change because in the second part of my theory Talk will be about selection of explanatory variables and it will be here. So see you during lunch and later