 I have had a request from one of your colleagues here, that apparently a couple of you never used Danova and actually had quite a dim appreciation of what an analysis of variance is. Of course it takes more than five minutes to explain it, but this is not the first time I've been asked to do the impossible in five minutes. So I'll take those five minutes to try and explain the basics of ANOVA. At least I hope I can assume that everybody has already done, run a T-test to test for comparison of means between two groups. OK, so one way to think about ANOVA is a generalization of these for more than two groups. So this is one possibility. But if you do this, say you have three groups, you try to test for differences among means, and you go it naively, you run a T-test for groups one and two, for one and three and two and three, you get a problem because you have run three tests for the price of one. Each time with an alpha rejection level of say 0.05. It's like throwing a dice three times and still thinking that overall you have one sixth, a probability of one sixth to get say a five. This is simply not true. By repeating the throws you increase the probability of getting a given value. And repeating pairwise tests when you have more than two means to compare is like repeating this dice play. You increase the risk of type one error. Globally you increase strongly the risk of thinking at least once, obtaining at least once a significant difference where in fact there is no significant difference. I'll not go into the details but this is the point. So ANOVA has been invented to circumvent this by comparing actually variances, hence the name analysis of variances. But how on earth could you compare variances if your aim is to determine whether means are different? This is done by doing this. Say you have three groups here. Within this group you have several observations with given values of your variable. And you're asking yourself whether this mean or the population, statistical population from where this group has been sampled is different from the mean from this and from this. The reasoning in ANOVA is to say within these groups you have variance. But this variance you cannot explain by the fact that they come from different groups because they come from the same. So this variance is residual. It is noise actually. The equivalent of residuals in a regression. Same here, same here. So this is the basis. And that variance within group variance or is it called in ANOVA is the basis of comparison. And now the hypothesis. You have the means here, here and here. And those may be different. Maybe one is different from the two others or the three are different. It's not important here. Or they may not. If the three groups come from the same population statistical population you have those means, those observed means that are slightly different but not too much. Actually it reflects the same kind of noise that you have within the groups. And so through the tricks of calculation I won't go into the details of calculation of course now waiting the means with the appropriate number of observations here. You get a measure of the among group variance. So the variance between the means of the groups. And if your null hypothesis is true meaning if those three groups come from the same statistical population then the among group which is actually ANOVA F statistic is an among group variance divided by within group variance. So if your null hypothesis of equality of means is true those two quantities are approximately equal. And actually the sample distribution is like an F statistic. An F distribution looks like this with its mode approximately at value one you have zero here. So if those two terms are equivalent from one experiment to another with your null hypothesis being true this ratio floats around the one value because those two are approximately equal in the statistical population they are strictly equal. Now if even one of those you can of course have a lot more than three groups but if even one of those is different from the others then this term here the numerator increases because you have more variance among the group means but since the within group is estimated within each of those bubbles even if I shift this up for instance the within group variance will still be the same and the consequence of any difference of means becoming important enough to be detected here will be to increase this term so your F statistic will go here to the right and I don't know what just happened here maybe my computer decided to sleep and the other thing didn't like it and so the consequence is that your F statistic will go up and so if the difference is large enough you have an among group variance that becomes large enough for the F to cross this critical value where you can reject your null hypothesis Does it answer your question? Seven minutes, sorry, two minutes two minutes over my time Okay, now let's go to the main topic of this second part of today's talk We had still that thing in several cases you are now let's go back to the general frame where we have multiple regression or redundancy analysis more specifically now that you know and in some circumstances you may have to reduce the number of explanatory variables that you will put into your RDA There can be various reasons for that one of them being going back to Michael Skardy's expression not enough sound ecological thinking at the outset so you end up with too many candidate explanatory variables in remote times were apparently well in some circumstances measuring rather trivial but possibly potentially interesting variables doesn't cost a lot so it was a bad habit to simply measure as many variables as possible and then, well, let's throw them into the analysis and the statistics will tell us which ones are interesting It's not, sorry You almost do, okay As long as you are in an exploratory phase this may be correct but please go back to what is already known and don't redo the science that is already done so this is the point but in any case, if you do this and this perfectly legitimate I was having that big I don't add this just because you told this now but it is perfectly legitimate in some circumstances that you have too many variables simply because you are in an exploratory phase of research you go fishing a little bit it's a dangerous play it's a dangerous game to play but in any case it may be legitimate sometimes it's not simply because it has been a substitute to unsound or lack of scientific or ecological thinking in this case In other cases, like the ones we will explore when we go to special analysis especially DBM, MEM and so on we have special procedure like those we use to build our spatial variables that produce many exploratory variables and here the name of the game will be to sort out, to identify those that best explain the spatial structures of our data so here again, instead of using all of them we'll have to select those that provide the best fit so this is another and a very frequent use of variable selection and there are several procedures of selection of explanatory variables First of all, let's say that no single perfect method exists to reduce the number of variables to select the best or an adequate subset of explanatory variables Some people go as far as to say that no such method should even be used but this is extremism and I won't go into this because we are practitioners we have to deal with our data in some way and in some cases the procedure itself requires that we select in a clever way the variable that we really need so this is what we will do now In multiple regression, you may know that usually the three available methods are forward selection meaning you start with nothing and you build your model up with the first and the second variable and so on Backward selection, so the reverse you start with the complete model and all the potential variables and then you throw out one by one the one that are least significant or explain the least which is basically more or less equivalent or then in multiple regression there is some tricks of what is called stepwise selection which combines actually both approaches it starts with the forward selection and as soon as there are two variables the second has not produced an effect that throws the first out and then it goes back and forth until everything stabilizes so this is basically what you can do in univariate multiple regression you can also apply two of the three in multivariate regression meaning RDA but we will see the details in a moment because the one that is preferred generally is forward selection for different reasons one, well anyway I'll come back to the principle I'll explain the principle first and I'll explain its shortcomings and the reason why despite those shortcomings it's still the forward selection that is now preferred so the principle goes as follows first you compute in turn you measure by RDA the independent contribution of each of your M explanatory variables the candidate variables so if you have five you have five RDAs to compute those are of course only the first part of it meaning the R square and the associated test you don't have to project anything at this step so each time you have only one explanatory variable in your X matrix and then you look at those of course there are already probably a couple of them that are not even significant but among those that are you select the one that explains most of the variation so the highest R square here you adjust it or not is equivalent in all cases you have only one variable in your set so the adjustment is the same for all the R squares so basically you have you fish out the one that explains most of the variance and it is significant so this one you enter into your model it's the first step and then you go back to the remaining variables and with the first one already in the model you one at a time you test the partial contribution meaning the added R square provided by each of the remaining variables in turn you have variable 3 that has been admitted as the first block in your model so you test 3, 3 plus 1, 3 plus 2, 3 plus 4 so that again you look for the second best variable considering that the first is already in the model if you find a couple of them significant and well in any case you look at the one that has the highest contribution new contribution, added contribution to the R square check whether it is significant or not this one, not the two together, this one so the added contribution and you add it into the model if it is the case and you continue so on and so forth until nothing more significant emerges and at this step you stop your forward selection and you have got your model this is a beautiful nice and clean story but of course it has its dark corners like every story or at least every interesting story first of all forward selection is too liberal nothing to do with politics here here we are sure in statistics being too liberal means tend to deem something significant too easily another case of being too liberal is for instance the example exposed by Pierre this morning when he spoke about problems in regression or some tests when you use maybe you have regression analysis and you apply it to non-normal data and he said that the true rejection level of your null hypothesis increased this is being too liberal and this you cannot do in statistics because you think you have a test at 5% rejection level meaning you are ready to be wrong you have one out of 20 tests and in reality you are at 10% or 15% or maybe even more I have in some tests I spoke about the Bartlett's test of homogeneity of variance this morning quickly mentioned it I have run simulations with non-normal data skewed data you may have rejection rates up to 80% when you think you are at 5% so never use Bartlett's test with non-normal data this is the parametric one it skyrockets your rate of type 1 error so this is way way too liberal in no way you can use such a test and you don't want to so this forward selection is too liberal this is the bad news the semi-good news is that it is the less too liberal from the free I mean the backward selection much more liberal than forward selection and stepwise is maybe a mix of the two but in any case all those procedures tend to let to admit too many variables into a model this can be checked by simulations with random variables and you see that there are more than expected under the null hypothesis and so you see that the true rejection rate is too high and you admit too many variables so this is a problem and this problem has been addressed in the case of the forward selection we run in RDA in the frame of a postdoc study that has been made by a former postdoc student of Pierre Legende called Guillaume Blanchet a good friend of us who works avidly in the field of numerical ecology and he made that kind of simulation with RDA and we came up with some would say a patch others would say a solution but in any case as a way of preventing at least a part of the problem well actually the part is double the problem is double there is a problem an overall problem of the forward selection procedure finding a significant model simply finding one too often maybe even starting with the first variable and then even if it is legitimate to have at least one significant variable the second problem is being that you tend to add too many of those variables so useless ones those that explain only noise in the models so those two issues have been tackled separately in Guillaume Blanchet's work so he set a couple of guardrails to avoid those problems or at least to keep them under control the first of those guardrails is the recommendation before forward selection to always perform a global test including all explanatory variables and if and only if this global test is significant then you can proceed to the forward selection it may seem strange to do that because after that we start up with nothing and try to add the variables one after the other this is because Guillaume observed that even in cases where nothing is significant if you have say a random variable and you try to build a regression model with a couple of candidates random variables in a procedure of selection if you don't start with a global test which in almost every case well, given the normal rejection rate will say that nothing is significant nothing occurs here if you directly start with a forward selection you increase the probability of having at least one entering the model so this is why you have this first guardrail before even trying a forward selection you take your response matrix you take your explanatory your full, complete explanatory matrix and you run a global RDA test the first that can be done the name of the object where you have done the RDA you'll see this in the practical this afternoon and only if this one rejects your null hypothesis if it is significant then you proceed to the next step and then if the second part even if the global test is significant forward selection remains too liberal meaning this time that it tends to admit too many variables into the model and for that the idea that Guillaume tested and developed and that was eventually published in 2008 was to add a second stopping criterion to the selection in ordinary selection for example one I exposed before the simple stopping criterion is the alpha or the p-value you add those variables until you have one that exceeds your alpha level if you have a p as long as you have a p-value below 0 or equal to 0.05 for instance you'll still continue the procedure and you'll stop it when you the next candidate variable cannot enter with the p-value below that alpha level or equal or below that alpha level and apparently this is not enough so what Guillaume proposed and what we tested and what was published is that you go back to the complete RDA the one you made with all candidate variables and now you look at the adjusted R square you remember that irrespective of the number of variables this one is supposed to give you the correct estimation an unbiased estimation of the explained variance of your model okay so you take this one well, I mean if you begin to have almost as many explanatory variables as you have objects even the adjusted R square is not recommended as a patch to clean out your variables another way in any case you have your adjusted R square this is actually considered in the second stopping criterion as the maximum value that can be approached because strangely enough in some cases you may add up the variables in such a way that the added R square the adjusted R square the build up of your model exceeds the one you have with all the variables this is because the adjustment takes into account the number of variables that you already have in your model so the adjustment is less stringent when you have less variables in your model which explains in turn why this adjusted R square can probably go over the one of the complete model and this we find not so reasonable others may argue but in the simulations that Guillaume ran it showed that on average this provided regression models or RDA models that had not those problems of liberality that tend on average to enter to admit the appropriate number of explanatory variables into the model there may be cases where you have one or two too many that happens all the time but on average it goes a good way to correct for that situation so that may be a little bit tricky explain that way but of course all this has also been automated in the procedures that we will run in R going further into those notions of course in RDA each of those tests stepwise test forward selection tests are made by random permutations so now you know what it is and how this works in RDA and another point another of those dark corners of variable selection procedures is that even if you well let's say you started with about 15 or 12 candidate explanatory variables and you find a superb model containing all or about four of them with an adjusted R square which is very close to the one with the 10 other ones okay this is good you have a good deal because you have thrown out variables that are not important in explaining variance in your data and you have a model that is easier to explain to handle and that is more parsimonious which well despite what has been told here is still a good aim to look at when it comes to real explanation of phenomena if you have less variables that can explain more of variance I don't see the use of adding more variables I really don't see this principle okay but even in this beautiful case where you have a small subset of explanatory variables that explain almost as much variance as the whole set you don't have the guarantee that this subset is the best possible one and no procedure will ever guarantee this and this is because we are using a sequential procedure here we first select the first variable that explains most of the variance but as you know your candidate explanatory variables are not independent in real cases they are correlated to one another to some extent so here works again that B fraction things you are acquainted with now the point is since you have put these variables first and not the one that was immediately came immediately a second the remaining the variance in the response data that remain to be explained is completely conditioned by this first variable if you had chosen another one then the remaining variance would have been different and maybe the selection of the second, third, fourth explanatory variable would not have been the same as soon as you take one it goes and eats a part of the variance so the structure of the remaining variance is conditioned by what has been explained by this first one and it goes on with the second, the third and the fourth unfortunately it's not possible in practical terms to run all possible combinations of numbers and of variables so if you have ten candidate variables you'd have to test the r squared of all pairs and then all triplets and all possible groups of four and five and so on it goes completely out of hand you cannot do that so unless doing that you hire a entire population of undergraduate students to do that it's not possible so you resort to that kind of things of course variable selection as I am presenting now to you is one possible way to go in a large family that can be called model selection meaning you have if you increase your scope and go beyond regression you may have other approaches to better model this and that problem and this goes beyond the course of course but here what interests us is what you can do with RDA and this is forward selection again forward against the two others backward or stepwise because it's although too liberal not so liberal as the two others and then especially now because we have this second stopping criterion based on the global test first adjusted R square so this makes it better suited to our purposes and avoids having really too many variables entering a model as in all regression models the presence of strongly intercorated explanatory variables renders regression or canonical coefficients unstable in the sense I have presented to you yesterday or day before so even forward selection does not eliminate this problem I would even say that if you have two strongly correlated variables several cases can appear in your specific case it's very likely that at least one well let's say that the two best candidates are strongly correlated to one another when you start then one of those necessarily by the sake of the random fluctuation will explain a little bit more the R square than the other so this one will enter in another data set taken in the same population statistic it may very well happen that the other will have entered first so the fact that resorting to forward selection to decide which one of the two is better is is not appropriate it won't give you the correct scientific answer on average you are just playing dice with this or flip flop it can be one or both and after that even if they are strongly correlated it may also happen that both enter the model because despite their correlation they are still independent the A and C fractions large enough to explain a significant part of the variation of the response data so it's always possible that both enter the model but still after that you will be stuck with the model with a high multicollinearity because of those two being strongly correlated so don't count on forward selection to avoid this problem so it can help but be aware that the choice between the two if there are two strongly correlated variable has no a priori ecological validity it has been made on statistical basis only and statistics don't tell you anything about ecology if you don't have the correct thinking beforehand this is always very important statistics is not an intellectual processes it does not substitute itself to your thinking this is very important in case where you really have several strongly correlated explanatory variables it may hide something sometimes you don't even see it clearly because in some occasions it's not simply a pair of variables that are strongly correlated but one variable can be explained commonly by another subgroup a subset maybe a 4, 5 or other variables that are not necessarily too strongly correlated among one another but collectively are very strongly correlated to one different variable this also increases the overall multicollinearity of your explanatory matrix and this can be checked with what is called the variance inflation factors it's possible to compute those for an RDA as well I don't think I have put the code in the practicals but it's in the yellow book for sure those variance inflation factors measure how much the variance of the regression of the canonical coefficients increase because of the presence of correlation so it's a mean of measuring the instability of a regression model meaning that for different data a sample from the same statistical population if you have this problem then the canonical coefficients could be completely different or very different from what they were in their first sample set and this was already present in Kanoko since a couple of you have used Kanoko that program that Brock wrote to run canonical correspondence analysis long ago already and he already had that possibility of computing the VIF the variance inflation factors and as a rule of thumb Brock recommended that variable that have a VIF larger than 20 may be removed should be removed from the model so you could check this and before even running resulting to forward selection it's another possibility you could do that remove or never remove or add more than one variable at a time because remember that every time you recompute a regression or an RDA the complete set of regression coefficients or canonical coefficients is recomputed because it takes into account the presence of all other variables so you never try to remove or to add two or three variables at a time there is one variable at a time so with the VIFs it would be the same you compute the VIFs of your RDA you see that you have two or three variables that are well above 20 don't remove all the three or the four at a time you remove the worst one you recompute your RDA and you verify sometimes you are surprised removing only one will settle the issue it may happen, it happened to me a couple of times now I won't take much of your time today maybe I'll take a couple of minutes after that to go with those to prevent you to use those slides with the interpretation of an interaction in ANOVA I promise that but just to finish with this selection, this quick presentation of selection procedures in the choice we have with our usual packages VEGAN ADE spatial is a brand new package, it's actually an offspring of ADE4 that now contains the part devoted to the analysis of spatial structures it's still being in a process of construction so we expect it to change a couple of times these next months because Stéphane Andre responsible for it still accepts and is in process of integrated many functions into ADE spatial for instance, the asymmetric Eigenvector maps that we will present to you tomorrow or on Friday I don't remember for now are in an independent package but they will be integrated in ADE spatial so what do we have we have this first function forward.cell that had been written by Stéphane Andre on purpose for the forward selection in IDA and especially using the second stopping criterion based on the adjusted R2 so for a long time this was the only function doing it and this is the reason why we use it very much in these last latest years it could not accept factors coded as factors in RDA so you had to recode the factors in helmet contrast to be used here and that may cause problems in some cases with selection anyway, forward selection can be done in forward.cell and using the second stopping criterion you have to compute your first RDA your global RDA first to be able to give forward.cell the adjusted R2 the complete one has a second stopping criterion but it can be done the code is in the book and in your practicals and in vegan for a long time we had all this step only which is based on a procedure of selection based on Akaike information criterion AIC this is actually borrowed from the base of R so it can be done for multiple regression and Yari Oksanan adapted it to RDA so it can be done with RDA and it offers forward and backward selection the only one of the three that offers backward selection if you are interested to start with a global full model and thin it up until you get something more tightly wound but not necessarily as parsimonious as with forward and it accepts factors but it cannot use the adjusted R2 because it is based on AIC and not on the ordinary stopping criterion of the other two including the alpha the probability value and so after many discussions, cussions and a couple of years finally Yari Oksanan wrote a version of this called Audi R2 step which this time works with the adjusted R2 so it implements the second stopping criterion so now with Audi R2 step in vegan you can run forward but no more forward selection with the stopping Blanchet, Guillaume Blanchet stopping criterion second stopping criterion based on square and it accepts factors as they are coded usually in R so these are your tools for this analysis before I go to my couple of slides devoted to interaction do you have some questions about what I already presented now to you ok we are after lunch it's warm people are sleeping up well sleeping fine so I'll go to that example of course it was it was made for my undergraduate courses so it was in French of course I had time to translate most of it in English yesterday or night before couple of words are still in French but I have put the translation once and for all in the corner of the slide ok this is so the situation is borrowed from an example by Sokal and Rolf in the statistics book the basic statistics book called Biometry you have I don't remember the exact numbers but this is not important here we are at the level of the principle there won't be any number in this presentation just 5 or 6 slides here so this is the layout you have rats here see couple of rats and they are fed lard lard is a pork fat you know the one you used to make a pastry and such kind of things and this lard in this case in this experiment there is a fresh lard the one you would use or rancid lard you know it has been on the counter at room temperature for a couple of days and it stings and everything and now this is one of our two factors so a factor with two levels fresh and rancid and for the rats we have factor sex so the response variable is the number of grams of lard eaten by the rats and as you see here we have within cell replication allowing us to test the interaction I have put three of them I don't remember how many they were in the original example but this is not important let's say it's balanced of course it's important 3 rats now my question is say you run the ANOVA and the interaction is significant what is the biological interpretation someone please give me that how would you express in one sentence the biological meaning of a significant interaction in this case the preference for or the amount of fresh versus rancid lard depends on the sex okay this is interaction now it could be the reverse but more difficult to figure out anyway I built up on this idea and now in the next slides I present you I try to cover just about every possible thing qualitatively, quantitatively it's an infinity of possibilities I've tried to cover every possible solution first pair of slides present you situation when the interaction was not significant and all the graphs are the same so how do they work on the abscissa you have the freshness of the lard so frais, fresh rancid takes not a noble price to understand French in such a limited case but I still have put the translation here and then on why you have the amount of lard that has been consumed could be grams here grams of lard and the second criterion the second factor is the sex the blue and the females in pink how original now let's go as you see first zero basic situation, null situation nothing is significant because if you average for the two sexes the consumption of fresh lard you get here rancid lard you get here as well if you average out males consume females consume you get here as well so everything is equal for both sexes so none of the two factors are significant and of course the interaction is not because here you have exactly the same shape of relationship or here in this case absence of relationship males as well as females they eat the same quantity of fat and are completely indifferent to the fact that it is fresh or rancid this is the basic situation at the right part in the other graph here you have the two main factors that are significant but still no interaction let's see why here, well I was here so I'll go here here let's see for the freshness the fresh lard on average if you take the females the amount consumed was here is here the rancid lard is here so there is a difference between fresh and rancid two sexes considered together so freshness is significant sex the males consume this much and the females that much it's also a difference so it is significant but the way males prefer fresh lard over rancid is the same as the way females prefer fresh lard over rancid the slope here is the same okay so there is no interaction two other cases still no interaction but this time with only one of the two factors significant here it's the freshness that is significant because all sexes or two sexes considered together the fresh lard is consumed in this quantity and this amount and the rancid one here so you have quite a large difference between the fresh and rancid lard that has been consumed okay so there is a significant effect of the freshness no significant effect of the sex because here males and females behave exactly the same way on average between fresh and rancid the females eat this much lard all freshness is considered together and the male the same amount so sex is not significant and since here again of course the slope is the same interaction is not significant and finally for the non-interaction cases it's the reverse here so males as well as females are completely indifferent to the fact that the the fat is rancid or is fresh the same amount but males eat more than females so the sex is significant but not the effect of freshness that was the easy part now we go with all the cases I have five cases with significant interaction figure out all the possibilities depending on what is significant on top of the interaction so here from now on you always have interaction the first one is the most exotic case beware it can happen it does not happen often but sometimes you may stumble upon an ANOVA where the interaction is significant but none of the main factors are and this is the case here because the effect is exactly opposite between the male and females so here if you look at the fact of freshness you would consider that it should be significant because obviously males as well as females react to freshness except that they react in opposite directions and that globally it cancels out because on average males eat this much and females this much and rinse it together so no significant effect of the sex and of course freshness does not it is the same with the degree of freshness because on average fresh lard is consumed in this amount and rinse it in this amount as well so everything cancels out a strong example where you could figure that the females rat females are delicate creatures that appreciate fresh lard very much but rancid lard they don't appreciate it at all so if they eat almost none of it why the males are dirty creatures that lift their nose upon fresh lard and prefer good stinky old rancid lard okay so this is really the opposite effect they cancel out so you have all the interaction significant so if this happens to you and it may happen don't throw your analysis away and try to forget you have done it you may have learnt it something very interesting about your data interesting things in data are always learned when you have unexpected results as long as you confirm what you already do now second situation everything significant you see I have to go back here freshness first fresh lard on average here I'll go a bit faster rancid lard on average up here so there is a difference freshness is significant sex females here in both cases in this figure the interpretation would be that females don't eat lard very much and they are indifferent to whether the lard is fresh or not and the males still are the dirty ones of the first example who prefer rancid lard that example there it's still significant in both cases but it's a milder case where you have simply different of observe that in all those cases you have slopes that differ between males and females this is the key point to identify the fact that after that you have to explain why it is significant in the terms I'm presenting to you now but this is the common feature of everything we see from now on so here the soaps are still different less different than here but they are still different so on average females here males here so freshness is significant and well fresh here and rancid here so freshness is significant and females here males here so sex is significant as well and since the females have a good taste for fresh lard but don't like rancid lard at all while the males are not completely indifferent but still prefer the fresh one you have still different slopes and the interaction is still significant the last ones I had to think quite a bit to figure out the situations here and draw them to get the results I wanted to have either one or the other factor significant so here look at freshness on average fresh lard here on average rancid lard here as well it's not significant while if you look at the sexes you have the females down here and the males up here so this is a mild case where there is some kind of intermediate taste for fresh lard but females decide well really prefer still prefer fresh lard and rancid one while the males have the opposite preference here but this is a milder case of the extreme case where the two lines crossed at the beginning so here we have a sex significant because on average freshness it's the same point here on the ultimate and here it's the opposite because if you look for freshness fresh lard you are up there rancid lard you are down here so there is a difference so freshness is significant oh yes it's the last one I should be here so here for freshness you have the difference a significant difference but for the sex now females on average here males on average here so it's not significant but there is still an interaction because you have still females prefer in this case it's quite important the slope is quite different here and you have for instance here females prefering fresh lard and males prefer rancid lard although if on average rancid lard is consumed in lesser quantities than fresh lard so every single of those situations is different what you have seen now four situations with non-significant interaction but different cases with the main factors and finally five different situations with significant interaction and various significances of the main factor but in every case don't forget the interaction you cannot interpret the main results as a block for instance here if you do that you would say well freshness is non-significant my conclusion is that rats are this means that rats are indifferent to freshness no they are not both sexes are react to freshness but in a different way so this is why I started by asking what was the first and most this morning first and most important result to look for in an ANOVA result it is the interaction test the result and the consequence now it's easy to figure the consequence when you can also relatively up to 2, 3, 4 levels for each factor you may draw those kind of interaction maps to figure out what's going on and it's you're encouraged to do it in Univari and ANOVA but the consequence of course here is that if you have such a thing as a significant interaction it means that you cannot interpret freshness globally you have to test the effect of freshness separately for the sexes and you don't interpret the effect of sex globally you have to interpret separately the effect of sex for fresh and rancid lard in this example so you have to split your ANOVA as soon as you have a significant effect in the case this morning I was relieved that the interaction was not significant because otherwise I should have tested the effect of altitude for every level of pH and the effect of pH for every single separate level of altitude to get a proper picture to account for this interaction now your question yes here like this was not significant no what is not significant is the interaction meaning both sexes react the same way to freshness and freshness act the same way to both sexes you don't see it as relative amounts in this case what is important here is that each so what you are mentioning is actually the effect of sex it's not an interaction this is the main effect of sex that you are describing meaning but it's assessed in absolute terms it's not assessed in relative terms it's maybe the only point but you do never unless you have for some reasons you may have in completely other designs you may have variables that are expressed in percents and so on but even then you may hit into problems because of distributions and so on this could quickly become technical but of course here we are speaking of real absolute variables maybe in grams or in anything like this your response variable cannot be shrunk to percentages here I understand what you mean but this is simply not the way ananova works maybe for some reasons you may consider that in terms say of intensity of metabolism also you may devise a completely different way of building such an experience for very valid reasons and then in some way consider the consumption of the energy brought by the consumption in terms of which could be relativized to the total amount for the males and the females but I would be afraid in this case that you would bias the experience at the outset because you already in your design you would take a part of the effect between the main effect of the sexes into account and this would certainly give a totally different picture but then again I would not be very comfortable with that after that if you want to go back to these results and say yes but now this is the brute result with grams of large consumed or amount of calories or whatever you want and then say yes but to be fair we would have to consider that to put this relative to the male or females way then you go beyond this analysis and you can still do it but the analysis itself tells you that in terms of amounts of large the reaction is the same so we could certainly go further with this discussion there is something to be done and of course in this example it's a simple way of showing how to interpret the outcome of the analysis and possibly the brute interpretation you can make of it and beyond that it goes to the particulars of every single experience and I cannot go into this now well I am already later than I expected to be thanks to those rats and I urge you now as quickly as possible to unless no Pierre you didn't have anything else to add here let's go to the practicals room and we will now go to get our hands dirty with some water