 We will come back to the second half of this lecture. We are looking at the factorial design of experiments. First we will start with a simpler design involving 2 factors at 2 levels. The superscript 2 refers to the number of factors. Please remember that. So we are going to have 2 power 2 design. That means only 4 experiments to see the effects of 2 factors. So let us call the levels of these factors as minus or minus 1 and plus or plus 1. Lower level means minus or minus 1. Higher level means plus or plus 1. Let us take an example. The 2 factors may be concentration of the reactant and the type of catalyst used in a reactor. We are looking at the percentage conversion or the yield. So the 2 factors are shown here. Minus minus means that both the factors are at their lower levels plus minus means only a is at a higher level, b is at a lower level. Minus plus means a is at a lower level, b is at a higher level plus plus means both of them are at their high levels. So this is the coded form of these levels of these factors and the description is given here. Coding is very important in factorial design. You have to convert the actual data given in terms of temperatures like 30 degrees, 60 degrees and so on and the catalyst type A, catalyst type B and so on. You have to convert them into coded form. For example, A and B types of catalyst may be represented arbitrarily as minus 1 for A and plus 1 for B. Does not mean that the superior catalyst should be given plus 1 and the inferior catalyst should be given minus 1. We do not know which of the 2 catalysts is better. So arbitrarily you can give A as minus 1 and B as plus 1 and logically you can give 30 degree centigrade as minus 1 and 60 degree centigrade as plus 1. If you want to give 60 degree centigrade as minus 1 and 30 degree centigrade as plus 1 also it is perfectly fine but it is sort of counterintuitive and so we may go for alphabetical ordering for minus 1 and plus 1 and numerical ascending order for minus 1 plus 1 for numeric factors. So let us move on. The 4 runs we have considered will form the corners of a square and when we want to calculate the effects, the average effect of a factor, this is quite important. The change in response produced by a change in the level of that factor averaged over the levels of other factors. So we want to see the effect of factor A. Obviously we have to see the effect of A when it goes from its lower level to its higher level but in factorial design of experiments you are not keeping all other variables at constant values. They are also getting varied. So when A is changing from a lower level to higher level, other factors may also be changing and when the other factors are changing also A may be moving from minus level to plus level. So the way to calculate the effects is quite simple. However, we have to take the average of all those cases where A moved from lower level minus to higher level plus. So in the case of a factorial design, we have AB minus B. Here you can see that A was moving from lower level to the upper level. B means A is at a level of minus and AB means A is at a level of plus. Here this is one combination. Another combination is when all the factors were at the lower level we had 1 and then when A alone was at a higher level we had A. So this is another case where A or factor A moved from its lower level to higher level. Another case where factor A moved from its lower level to higher level. So these two cases we are averaging and then we are multiplying by n to take into account the number of repeats. Similarly we do for B. So one AB and AB are treatment combinations and the total of n repeats taken at each of the treatment combination. So we also have to account for the number of repeats when we are doing the averaging. The interaction effect is calculated in terms of AB minus B minus of A minus 1. This is again quite simple. What does AB interaction tell us? What is the effect of factor A at 2 different levels of factor B? That is all AB interaction is all about. So the effect of factor A at higher level of factor B is given by this combination. Here as well as here B is at a higher level and so you have AB minus B giving the change in A at a higher level of factor B. So we get this difference and then we want to see the change in factor A at a lower level of factor B and that is given by A minus 1. The difference between these two gives the interaction. If the change in factor A is 10 units at higher level of B and the change in factor A is again 10 units at lower level of factor B, we get 10 minus 10, we get 0, we get AB interaction as 0. So there is no interaction between factors A and B but at higher level of B, we get A effect as 10 units and at lower level of factor B, we get the A effect as 15 units, then it becomes 10 minus 15, we get minus 5. That means there is a significant interaction between factors A and B. If you remember these concepts, this is sufficient. When we go for more number of factors, we get more and more number of terms and it may be a bit difficult to relate to them but the same concept applies in all of them. And we call the term inside the brackets or parenthesis as contrasts. So AB plus A minus B minus 1 is a contrast. Similarly AB, we have AB plus 1 minus A minus B is also a contrast. So for contrast for A is AB plus A minus B minus 1, contrast for B is AB plus B minus A minus 1, contrast for AB is AB plus 1 minus A minus B. And that is what I said. So we want to calculate the effects of factor A and if one factor B is kept at the constant level and the other is varied from one level to another level, the change in the output is due to the main effect of A. The same change in output response will be produced if level of factor A is changed from one value to another level at a different fixed level of factor B. So similarly the main effect of B will also be independent of the setting of A. This is when the 2 factors do not interact. However, when the 2 factors that are not independent of each other are set to interact. When factors interact, the change in response due to change in one factor depends on the level of the other factors. This is again quite self-evident. So we are now still talking about interaction effects. If the change in level of the first factor causes a certain change in the output response at one level of the second factor, an identical change in the first factor level at the second level of the second factor will produce a markedly different output response when there is interaction. Interaction is the feature of one factor which one changed fails to produce the same change in output response at different levels of the other factors. In the cricket's course example, the effect of changing from light bat to heavy bat changed rather drastically depending on whether the batsman had consumed tea or beer. I am going a bit fast here because the concepts I have already explained with reference to the 2 power 2 design, it is just being said in different ways here. So to find the interaction effects, we take the difference in effect of one factor at the 2 different levels of the other factor. If the effect of the first factor is the same at different levels of the other factor, then interaction is absent. When interactions are present, the one variable at a time strategy will produce poor results when trying to find the optimum combination of factors in an experiment. Interaction twists the response plane and induces curvature in the response plot relating the output to the input parameters. The simple summary here is if you have interaction between the factors, the simple strategy of one variable at a time experimentation will fail to find the true optimum conditions. So you have to resort to design of experiments to not only identify the interaction effects between the involved factors but also proceed using the design of experiment strategy to find the optimum combination accurately. Otherwise, you will be far away from the optimum using the one variable at a time approach and that is not good for the process, okay. It will be suboptimal leading to loss in time, production, energy, money, labor and so on. Actually interaction effects may be more important than the main effects and even there may be certain cases I have seen one example where the main effect is unimportant. Suppose you are having two factors A and B, one of the factors is unimportant. A is for example unimportant but when you are looking at the interaction between A and B, AB may be having a significant role to play. So you cannot conclude or generalize that when one main factor is absent, all the interactions involving that factor would be insignificant even though a main factor may be insignificant. When it is acting together with the other factors, it may have a significant interaction. So it is always better to look at the interaction effects rather than at the main effects. Only when tests have concluded or tests have conclusively shown that the interaction effects are insignificant, then you pay attention to the main factors and look at the contribution of the different main factors to the overall outcome of the process and very importantly a significant interaction may even mask the main effect. Now let us look at this design matrix. This is very interesting and it also helps us to find the effect of different factors and the interactions in a very simple manner. So this particular design matrix is very important. Here we have setting 1 and A, B and AB. This I is the identity column if you want to call it that way and it is having all entries as positive. Then you have A, when you look at 1, at 1 both A and B are at their lower settings. So you put minus here and you also put minus here. AB would be product of A and B and that would be minus into minus which is plus. When you look at small A, it means that only factor A is at the higher setting and all other factors are at their lower setting. So you get A, you put plus for A only, for B it is minus and so AB would be minus. B is the other way, A is at a lower level minus, B is at a higher level plus, AB is still negative. And AB, both of them are positive and so AB is also positive. And this is very interesting and when you add up all these minuses and pluses, you will get 0. The number of minuses will compensate for the number of pluses. So any of these columns will have 0 when you add up the minuses and pluses. That is one important thing and when you look at A, you are having minus 1 plus A, minus B plus AB, minus 1 plus A, minus B plus AB. So A and AB are positive, 1 and B are negative. You go to the contrast for A. We saw that a while back, if you look at the contrast for A, you will find A and AB are positive and B and 1 are negative. This is exactly what we saw from the design matrix. So you can use that to calculate the effect of A. AB plus A minus B minus 1, then you divide it by 2 and then you multiply by the number of repeats. That will give you the effect of main factor A. Similarly, you can do the same approach or use the same approach to find the main factor B which is given by AB plus B minus A minus 1. What has happened is we have switched A instead of B and so you have plus B and then you have minus 1, minus A. So you can use the design matrix itself to identify the contrast set and then you divide by the number of repeats and also average them out. So you are having 2 sets AB minus A and B minus 1. So you can put it as 2 here. So this is a very simple way. You can do it not only for the main factors but you can also do it for the interaction. For interaction, we see it is AB plus 1. The extremes are positive and the intermediate ones are negative. Both A and B are negative. Let us go and look at the design matrix. So the extremes are positive and A and B are both negative. So using this design matrix, we can calculate the contrast and hence the factors pretty easily. Now let us look at the sum of squares. Not only we have to look at the main effects and the interactions, we also have to see which one of them is more important than the others and for that we need to do the analysis of variance and for that we have to calculate the sum of squares. Calculation of the sum of squares is also pretty easy. We again use the contrasts and square the contrasts and divide it by 4 into number of repeats. Why we have 4 here? There is a bit of theory involved using the approach of contrasts and how to calculate the sum of squares from them. I am not getting into that. It will take us away from our area of focus. There is a good discussion on the use of contrasts in Montgomery's design and analysis of experiments book. You may please refer to that to get an idea on why we should put 4 here and when you go for higher order factorials, why we should put the certain coefficient. We will come to that when we look at 2 power 3 designs. Similarly for sum of squares of B, we take the contrast for B, square it and then divide it by 4n. Again for sum of squares of AB, we take the contrast for AB, square it and then divide it by 4n and total sum of squares is given by the shortcut formula and the sum of squares of error is given by the difference between the total sum of squares and the sum of squares contribution from AB and AB and this gives us the sum of squares of the error. So now let us look at factorial design involving 3 factors. Since there are 3 factors, the design would become 2 power 3 where 2 is the number of levels – 1 plus 1 lower level or upper level and then we are having 3 factors. So we are having 2 power 3 design that means it will involve 8 runs and these 8 runs may be represented as the corners of a cube and now it should be easy for you to identify the various components. One means all AB and C are at their lower levels. You can take this as a 2-dimensional representation of a 2 power 2 design involving factors A and C. So the bottom base would be 2 power 2 design involving factors A and C. So this means only A is at a higher level. This means C is at a higher level and this AC means both A as well as C factors are at their higher settings and so this forms the corners of a square and this particular 2-dimensional representation represents the 2 power 2 design involving factors B and C. Similarly, you can see the significance of the various corners. For example at this AB it means that factors A and B are both at their higher levels. Go along this direction, you will see factor A is at a higher level. Go along this direction, you will see factor B is at a higher level. So both of them are at higher level. So AB combination we have which means also that factor C is at a lower level. Since you are going for C in this direction, factor C would be at a lower level and so you do not find C. But when you have this particular point, all of them are at their higher levels and so you have the notation ABC. It is very simple. If you have understood the 2 power 2 design there will be very little difficulty to understand this particular concept. Now let us look at the similar 2 power 3 factorial design matrix. This is very interesting. You can see that ABC are the lower levels here and then I am changing only A so plus 1, minus 1, minus 1 and very importantly I am not changing it to only one variable or one factor time. When I am going from run 2 to run 3, you may see that I have changed A from plus 1 to minus 1 and B from minus 1 to plus 1 keeping C level at minus 1. So you can see that I am changing the factors simultaneously. Similarly, when I am going from 4 to 5 or even from 3 to 4, I have again changed A from minus 1 to plus 1 and B and C are at their lower levels. But when I am going from 4 to 5, I have changed both A and B from plus 1 to minus 1. In fact, I have even changed C from minus 1 to plus 1. So from levels 4 to level 5, I have changed all the factors simultaneously. This is one important feature of the 2 power n or 2 power k factorial design where k is the number of factors. In one variable at a time, you are going to change only one variable from one level to another level keeping the other variables at the fixed values. Moving on, we can see that now the number of factors has increased from 2 to 3 and so the elements are looking more cluttered. It is slightly more difficult to explain the difference between different levels when A changes. So you can see that there are 8 elements 1, 2, 3, 4, 5, 6, 7, 8 and the same concept applies. What is the average value of A at higher setting of A minus the average value of A at the lower setting of A? So let me come again. What is the average response at the higher level of A and what is the average response at the lower level of A and I am then taking the difference of these 2 to get the effect of A. And so you may think, oh my God, I have to remember such large formulae for the quizzes and exams. You do not have to. You can see that everything involving A is positive, A, A, B, A, C, A, B, C is positive and everything not involving A is negative, 1, B, C and B, C. You please go back to the design matrix and okay we have to construct the design matrix for this case. We will do so in a minute. Similarly, for AB, you can see that these are positive and these are negative. Again, you do not have to break your head on trying to remember such big formulae and you may start wondering what should I do if I have 2 power 4 design which will involve 16 elements and that would become very cumbersome to remember. So you can also have ABC interaction. So this is becoming quite unmanageable but let us look at the design matrix shortly. So you can see the contrast, you will have equal numbers of pluses and minuses. So you have 4 pluses and 4 minuses that is expected and the product of any 2 columns yield another column we have already seen this and sum of squares is contrast squared divided by 8n okay. So that is very important. So how do we set up this contrast? This is the table. Now we are back in business. So you have different treatment combinations. All factors at lower level, A is at a higher level, B is at a higher level, both A and B at a higher level and C at a lower level. Only C at a higher level, AC both factors A and C at a higher level but B is at a lower level, BC both factors B and C are at their higher levels and factor A is at a lower level, ABC all factors AB and C are at their higher levels and this is the identity column and that is plus 1, plus 1, plus 1 throughout A this would be minus, A would be plus, B would be minus, AB would be both A and B are at the higher level. So you give plus 1 here, C only C is at a higher level so you give minus 1 here, AC A is at a higher level, C is at a higher level so you give plus 1 for A, factor A, BC A is at a lower level so you give minus 1 here. ABC all of them are at a higher level so you give plus 1 here. Using this logic we can do for all other elements. For example when you have AB you do not have to worry about what setting AB would be. Simply multiply A and B minus 1 into minus 1 you will get plus 1 for AB. Similarly for C at 1 it will be minus 1 obviously for B again it will be negative. So this logic may be extended and applied uniformly and you can get not only the main factors, contrasts but also you can get the contrasts for the binary interaction and also for the ternary interaction. For example when you have ABC here you multiply AB and C in this case is minus 1 into minus 1 which is plus 1 and into minus 1 it will be minus 1. So you can use either minus 1 or minus or plus and plus 1 so that is not a problem. Minus or minus 1 refers to the lower setting and plus or plus 1 refers to the higher setting. For example let me take C, 1 is at a lower level so C would be minus 1. A only A is at a higher level again C will be minus 1. B only B is at a higher level so C will be again minus 1. AB both A and B are at their higher levels but C is at a lower level so again C is at a minus 1. But from now on you can see that C is at a higher level throughout and so you will have plus 1 throughout for C. I hope you have understood this logic it is very simple. You just try to do the things on your own and very quickly you will get the correct sequences or correct contrast sets for all the main factors as well as the binary and ternary interactions. Now we are going to look at the response model. We are slowly now moving from identification of the effects to the importance of the different effects or the relative importance of the different effects. So now let us look at the response model yijk that is equal to mu plus tau y plus beta j plus tau beta ij plus epsilon ijk. This is a linear model which is split into the overall mean value mu, the contribution from factor A, contribution from factor B, the interaction contribution tau beta from the 2 factors and then you have the error component. So you can have A settings of factor A. You do not have to have necessarily 2 settings of factor A minus 1 and plus 1. You can even have 3 settings of factor A and that may be different from the number of settings of factor B and that may be different from the number of repeats. Usually we put small a is equal to 2, small b we put it is equal to 2. N can be any number greater than 1 for meaningful repetition. So the response yijk is modeled as the sum of the overall mean value plus the effects of factors A and factors B plus the effect of the interaction between A and B and the random error component epsilon ijk. And what is the null hypothesis? The null hypothesis tells that the response is only based on the overall mean response plus the error component. It tells that effect of factor A is 0, effect of factor B is 0 and there is no interaction between the factors. Whatever may be the setting of factor A or whatever may be the setting of factor B, the effects are 0, okay. The response is only the overall mean value tempered or altered by the random error component. The alternate hypothesis states that at least one of each of these treatments, main and interaction have a significant effect on the process. So you are having this and at least one level of treatment A and one level of treatment B or one level of interaction between the two treatments will be effective on the process. So the null hypothesis says that all effects are unequivocally 0. Not a single factor has an influence on the response of the process but the alternate hypothesis tells at least one factor is important. Either tau i is important or beta j is important or at least one interaction between tau i and beta j is important, okay. So this is the null and alternate hypothesis. What it means is one of each of these treatments, okay, at least one setting involving factor A is having an impact on the process, okay. When you are going from one level of factor A to another level of factor A keeping the second factor B at a constant value, at least when you are going from one setting to another setting, in one such cases, one of such cases you are going to have an effect. Similarly when you keep A as constant, when you go from one setting of factor B to another setting of factor B, there is an effect on the process which is different from the overall mean mu. So this is the mathematical statement of all these null and alternate hypothesis. You can say that unequivocally tau 1 equals tau 2 so on to tau A is equal to 0 and similarly for the factor B, beta 1 equals beta 2 so on to beta B is equal to 0 and then the main interactions are also 0. And then you have the alternate hypothesis saying that at least one tau i is not equal to 0, at least one beta j is not equal to 0 and at least one tau beta ij is not equal to 0. So how do you get the sum of squares? So let us now focus on the total sum of squares and it is the sum of the squares of the deviations of the experimental data from the grand average. So this is a measure of how different the individual treatments are from this average and this total sum of squares may be decomposed or resolved into contributions from factor A, factor B, the interaction between A and B and also the random error component. So once you are able to split these entities, you can compare the contribution from each entity with a total sum of squares. So that is what I have written here, total sum of squares is equal to SSA plus SSB, sum of squares of A, sum of squares of B and sum of squares of AB plus error sum of squares. Each sum of squares has a degree of freedom associated with it. The degrees of freedom indicates the number of independent data associated with the corresponding sum of squares. So this is the data table which tells factor A along this direction, factor A at setting 1, setting 2, so on to the 8th setting of factor A. Similarly for factor B, you move horizontally, you have first level of factor B, second level of factor B, so on to B levels of factor B. If it is a strictly 2 factorial design, you will stop for A at 1 and 2, for B at 1 and 2, so you will be having only this combination, these 4 elements. Obviously within each cell, 1, 1, you can have n number of repeats, so i, j, k, i is 1, j is 1, k is 1, i is 1, j is 1, k is equal to 2, i is 1, j is 1, so on to k is equal to n, so you have n entities in this first cell. In the second cell, you are having the same level of factor A at 1 but now you have gone to the second level of factor B, again you can have n repeats. We are all assuming that the repeats per cell is constant, there is the number of repeats done here is equal to the number of repeats done here and anywhere else in any other cell. So you have factor A at level 1, factor B at level 2, first repeat, second repeat, so on to nth repeat, the same logic is applied to all other cells in this table. So I have made it a bit simple, you can see that only factor A is changing from first level to eighth level, factor B is kept at first level, j is equal to 1, second index is equal to 1, so you have y1 corresponding to level 1 here, B is at level 1, so B is always going to be 1 and first repeat and then it is 1, 1, second repeat, so on to 1, 1, nth repeat. Similarly when you go to the next level of factor A, the A index I will take a value of 2 and so you will have 2, B is still at level 1, 2, 1, 1, 2, 1, 2, so on to 1n. So when you add up all these things, you get y dot, 1 dot which means only factor B is kept at level 1 and the others are all added up and then you can take the average y dot, 1 dot, that means j is unaltered, j is kept at one setting but both A and n are varied, so it will be A into n, we will come to that very shortly but I hope you have understood the layout of this particular table, this is another form where you are now fixing factor A at one level but looking at the different levels of factor B, the notation goes as y1, j is equal to 1, first repeat and this one would be y, first setting of factor A, B is kept at level 2 here throughout and first repeat, again 1 stands for factor A setting, I index I is equal to 1 and then factor 2 is kept at level of 2 and then first repeat, so again 1, 2, 2, so on to y1, 2n, similarly you can do for the beat level of factor B, so now this is very important, it tells you how to do the summing and averaging, y dot, dot means I is kept constant and j and k are summed upon and j and k are summed upon for B levels of j and n levels of k, you have yijk, when you want to take average you simply divide the total sum given here by B into n, for this is applicable for every value of I, 1, 2, so on to A, y dot, j dot, now you are keeping the index j as constant and then you are adding over ith index and also the kth index, that is what is represented here, over an elements and so you have y bar dot j dot is equal to y dot j dot divided by a into n, where y dot j dot is nothing but this summation and j can go from 1, 2, so on to B. Please spend a bit of time trying to understand all these things, it will help you later and it will not cause any confusion subsequently. Now when I do yij dot, I am saying that only I am adding over k, I am keeping i and j at constant values, so only k is varied from 1 to n, you have yijk summation to give yij dot, y bar ij dot means whatever sum I have got here, I am going to divide by the number of elements in that particular sum which is nothing but n and I am getting that y bar dot j dot average value, y dot dot dot means I am summing over all the index, indices rather ij and k for the response yijk and so I get this particular expression. Now I divide it by abn or abn is a total number of terms, I get y bar triple dot. I hope you understood the nomenclature, this is very important. If you do not understand, please try to understand using the current lecture material and discussions because otherwise you are going to get hopelessly confused. Now this is very fascinating, you can see that the resolution of the total sum of squares, I told you the total sum of squares is yijk, individual observation response minus y bar dot dot dot okay and so you have yijk minus y bar dot dot dot, individual observation minus a grand average that is squared and then that is done for all the elements in the table which we saw. We are doing it for all the elements, we take the grand average and then we subtract the grand average from each and every entity given in this table and then that sum of squares will be the total sum of squares and that is resolved into the sum of squares due to factor a which is the ith treatment mean corresponding to factor a from the overall grand average. Similarly, this is the treatment mean corresponding to the factor b and that you subtract with the overall grand mean and this is the interaction effect between a and b and this one would be the individual observation within each cell subtracted by the average of those repeated observations. Again this is very similar to the single factor experimentation where I explain these in more detail, same concepts apply also here. So, this is the effect of a, sum of squares of factor a, sum of squares of factor b, sum of squares of interaction a, b this is slightly difficult to understand and the sum of squares of the error. So, when you look at the main effects you have only a-1 independent entities because out of the y bar i dot dot treatment means there are a such treatment means and only a-1 of them are important. How do you get y bar i dot dot that is quite simple. So, each of these entities here represents y bar i dot dot, here i is equal to 1, here i is equal to 2 and here i is equal to a. So, you have y bar 1 dot dot okay and you have y bar 2 dot dot so on to y bar a dot dot but since you are using all of these if I take average of all these divided by a, I will get the grand mean. So, since I am taking grand mean from these only a-1 are independent, same argument can be given for factor b, these are the treatment averages for factor b and I can get the grand mean from averaging out these treatment b means. So, only b-1 of them are important okay. So, that is very important. So, we have finished to the main effects, what about interaction a and b? If interaction a and b is slightly difficult to understand, factor a is having a degree of freedom of a-1, factor b is having a degree of freedom of a b-1. So, the interaction of a b would be a-1 into b-1, if you look a bit further Montgomery in his book on design and analysis of experiments has given a nice explanation. What he says is first see the degrees of freedom with a b cells, the degrees of freedom with a b cells would be a b-1 and then you subtract from this available degrees of freedom, the degrees of freedom due to factor a and the degrees of freedom due to factor b and you will get nicely a b-1-a-1-b-1 which is nothing but a-1 into b-1. So for the degrees of freedom for the error, you are having a b cells totally but within each of these a b cells, even though you are having n repeats, only n-1 of them are important. So, totally you have a b into n-1 independent entities and that is the degrees of freedom for the error. Again the argument here is very similar to what we did for single variable experimentation. If you are finding this slightly difficult to understand, I request you to please go back to the single variable experimentation design discussion and now whatever we are doing will be easy to follow. So, the degrees of freedom may be partitioned into a b n-1 which is a total degrees of freedom that may be partitioned off into a-1 for a, b-1 for b, a-1 into b-1 for a b and a b into n-1 for the error. Now we have seen the total sum of squares for each case, these are for a for b, a b and for error. We can divide each of these sum of squares with the degrees of freedom and we can get the mean square error, we can get the mean square, not mean square error, sorry. So mean square a would be sum of squares of a divided by a-1, mean square b would be sum of square of b by b-1, mean square a b is sum of squares of a b by a-1 into b-1, mean square error would be sum of squares of e by the degrees of freedom for error which is a b into n-1. Again this is very similar to what we have done with the single factor experimentation. Expected value of the mean square of a is the error variance sigma squared plus the additional variability caused by the 8th factor, the 8th factor or the first factor a is contributing to the response, it cannot be ignored and so at least one setting of the 8th factor is having an effect and so you are adding up all the treatment effects here. So this is over and above sigma squared, if this were not important then you will have only sigma squared, if this term vanishes or becomes negligible then this will have only sigma squared. So the variation due to changing factor a may then be attributed to random noise. Similarly, expected value for mean square b would be sigma squared plus a into n sigma j equals 1 to b beta j squared by b-1. When these effects are insignificant then the variation due to b is because of random error but if it is not so factor b is effective in altering the response of the process then you have this additional contribution. Summary for AB, you can see the nice symmetry in all these things, you can see now for AB I am doing the summation over both i equals 1 to aj equals 1 to b tau beta ij whole squared. Expected value of mean square error would be expected value of sum of squares of error by AB into n-1. So here there is no inclusion of any other term, we are talking about pure error only. I hope you have understood this discussion, this is very straightforward but this expected value is only to tell that you are having the error variance and additional contribution variance or variability due to the 2 main factors and their interaction. If the main factors and or the interactions were ineffective then those variability are also estimates of the random error component. If the effects are significant then you cannot take it as random variation but you have to separately account for the effects of those factors. But this is pure error and so you are getting sigma squared. Now we can do the F test, what we do is we do the mean square for A, how did we find mean square for A? Sum of squares of A by A-1, we already saw how to find sum of squares of A, so mean square A by mean square error. Similarly we do the mean square B by mean square error, mean square AB by mean square error. We have seen about the beauty of the F distribution where it has a numerator degrees of freedom and denominator degrees of freedom. Numerator degrees of freedom for A would be simply A-1, denominator degrees of freedom would be AB into N-1. Similarly for factor B it is B-1 degrees of freedom in the numerator and AB into N-1 degrees of freedom in the denominator. Similarly for interaction AB you are having A-1 into B-1 degrees of freedom for mean square AB divided by AB into N-1 degrees of freedom for the denominator. So with this in mind we have to either accept or reject the null hypothesis depending upon the value of F. We see the value of this F and see whether it falls in the rejection region or in the acceptance region. It falls in the rejection region if the statistic F0 comes to F alpha which is the level of significance, alpha is the level of significance 0.05 is usual. You may also use 0.025 or 0.01 or even 0.1. A-1 is the numerator degrees of freedom. It is degrees of freedom for factor A. AB into N-1 is the degrees of freedom for error. So if you find this critical value and your statistic is much higher than that and it falls in the critical region you reject the null hypothesis. Similarly you do the same thing for factor B. Find out F alpha B-1 numerator degrees of freedom AB into N-1 denominator degrees of freedom for the error component. If F0 is greater than F alpha B-1 AB into N-1 so be it. So you are able to reject the null hypothesis that even factor B is not significant you have to state that even factor B is significant. Same thing you do for AB here you use alpha level of significance A-1 into B-1 is the degrees of freedom for the interaction AB and that is the numerator degrees of freedom. AB into N-1 is again the denominator degrees of freedom corresponding to random error and if this F value is exceeding the critical if this F value is exceeding the critical F value given according to this relation then you reject the null hypothesis that the AB interaction is insignificant. So you can test for the interaction effects first and then look for the main effects. If the interaction effects are not significant interpretation of the main effects on the test is quite simple. Again as I said previously before interaction effects are more important than the main effects. So this is the summary of the ANOVA table. Sum of squares of A sum of squares of B sum of squares of AB sum of squares of error by sum of squares of total sum of squares. If I add up all these elements I will get total sum of squares. Here the degrees of freedom is A-1 B-1 A-1 into B-1 AB into N-1. We have already seen this several times in the past so I am not going to spend too much time on that and when you add up all these degrees of freedom you will get AB into N-1 means squares are formed by dividing the respective sum of squares or dividing the sum of squares by the respective degrees of freedom. You will get SS A by A-1 SS B by B-1 SS AB by A-1 into B-1 SS E by AB into N-1. What are the terms in the denominator in each of these expressions? We are nothing but the degrees of freedom associated with that individual factor or the combination of factors. F not is defined simply as sum of squares of A by A-1 sum of squares of B by B-1 so all these terms are individually divided by mean square error. Even interactions you do not leave it alone you take the sum of squares of AB divided by the degrees of freedom for AB you get the mean squares for AB that you divide by the mean square error. So you can see which of these are falling in the critical or rather the which of these F values are exceeding the critical value and hence falling in the rejection region based on which you can state your acceptance or rejection of the null hypothesis. Now we move on to general factorial experiments. The concept is very similar to the 2 factorial experiments. You can conduct with any number of factors with any number of arbitrary levels. Factor A can be 2 levels, factor B can be 3 levels, factor C can be 4 levels and each may have a certain number of repeats. We assume that the number of repeats is constant. So I will go through it very quickly. So you can have in general A levels of factor A, B levels of factor B, C levels of factor C and we have N equal number of replicates or repeats. We have equal number of replicates or repeats for each treatment in the experiment. So we have this model yijkl. So you have an additional subscript now, i stands for factor A, the index i stands for factor A, index j stands for factor B, index k stands for factor C and l stands for the repetition. This is the overall mean tau i represents the effect of the ith level of factor A. Beta j means the effect of the jth level of setting of factor B, gamma k refers to the effect of the kth setting or level of factor C and this represents the interactions, binary interactions between factors A and B, binary interaction between factor A and C, binary interaction between factor B and C and this is the ternary interaction between factors A, B and C then you have the error component. You can see that the index i varies from 1 to A, index j varies from 1 to B, index k varies from 1 to C and the index l varies from 1 to N. So we have again this table. You have the source of variation, the degrees of freedom are A-1, B-1, C-1 and A, B, C into N-1. You may be asking what happens to the interactions A, B, B, C, A, C. Please be a bit patient, it is soon coming and the mean square error is the sum of square contribution of A divided by the degrees of freedom for A which is A-1. Calculating the sum of squares from now on will become more tedious because there are many factors. It is not expected that you do these things either with the hand calculation or even with spreadsheet. There are softwares like Minitab which is available and you may resort to such software to calculate the treatment squares and the expected mean square value would be nothing but the random error component plus the contribution by the individual treatments. So you have sigma squared plus the contribution from the treatment A, sigma squared plus contribution from treatment B, sigma squared plus contribution from factor C and so on. If these contributions are not there, then the expected mean square would be an estimate for the random error component sigma squared. Otherwise, you have not only sigma squared but an additional contribution from the main effects or as we will see the interaction effects. So the mean square error is sigma squared. Now you have the interactions and again you have the sigma squared error variance plus the contribution from A, B interaction, A, C interaction and B, C interaction. And you can always form the mean square and you can divide the mean square for the effect or the interaction with the mean square error to get the F naught value. Then you can identify or create the critical value based on the numerator and denominator degrees of freedom and then you can see whether the F value is lying in the acceptance region or in the critical region, then suitably you can accept or reject the null hypothesis. So for example A, B, C you are having A-1 into B-1 into C-1 as the degrees of freedom and the mean square is the sum of squares of A, B, C divided by A-1 into B-1 into C-1 and the expected mean square here we do not use it in the F test is just to show that it has a error variance sigma squared plus the contribution variability because of the A, B, C interaction. And to find the F naught value you divide mean square A, B, C by mean square error. You have numerator degrees of freedom as A-1, B-1, C-1, denominator degrees of freedom as A, B, C into N-1. You can look up the F probability charts based on these degrees of freedom and see whether the F statistic is lying in the critical region. So this completes our discussion on factorial design. The 2 power k factorial design is quite useful, elegant and easy to understand for small number of factors. What is to be done when there are large number of factors even then the number of experiments may blow up including the repeats it may be quite an investment to do so many experiments there are some elegant alternatives to the general factorial design. So what I am trying to say is the factorial design is not only restricted to 2 levels it can be generalized into any number of factors and any number of levels. The only problem is even when you generalize you also have to keep tab on the number of experiments you have to do and coupled with the repeats general factorial design involving many levels and many factors and repeats will lead to again a large number of experiments. So we have to find alternatives to even these factorial designs and this will form the background for our future discussions. Thank you for your attention.