 Welcome back to the lectures on statistics for experimentalists. We have completed the factorial design and also the general factorial design. We learnt how to identify the main effects, the interactions from the factorial designed experiments. We also carried out the analysis of variance to identify the important effects influencing the response. The factorial design is an economical and efficient way to carry out experiments, especially in industry. When we want to do experiments, it would mean a lot of investment in manpower time resources. So it is important that we try to get the maximum information from I would not say minimum efficient economical number of experiments. So in the industry and also when doing research, we really do not know what are all the factors that may influence the response. We can look at literature and see what other people have done. When there are large number of factors, especially in product design or development of a new process, we need to identify the appropriate factors. So we will be looking at fractional factorial design and as the name or the title suggests, we will be looking at fraction of the factorial designs. So how to identify the fractions will form the outline for this course. So looking at the references, we have the books written by Montgomery and Runger applied statistics and probability for engineers. This is a concise book starting with the aspects of probability moving on to inferential statistics, hypothesis testing, parameter estimation, method of maximum likelihood, probability distributions and then it also deals with the factorial designs, fractional factorial designs. So it is a single point reference for both students who want to learn the subject and also for practitioners. Then we have a more detailed book written by Montgomery and this gives more designs, more advanced designs in addition to the basic ones. We will be referring to both these books during this lecture. So when we want to develop a new process or design a new product, sometimes we may not know which factors influence it and to what extent and hence we do not want to leave out any factor and we want to include as many factors as possible. But when you add more and more factors to be on the safe side, you are also adding on to the size of the experiment, the number of experimental runs you have to make, the investments you have to provide. So what would be a logical step is to do the experiments in a sequence. We carry out a fraction of the factorial design. Even though the factorial design, usually two level factorial design is efficient and reduces the number of runs. When the number of factors increase to a large value, we need to even further economize. For example if you are having 5 factors, 2 power 5 is 32 and if you want to do at least 2 repeats, it would mean 64 runs. 64 runs may take a long time to complete. So we need to do a fraction of the possible experiments given by the factorial design. See what information it provides. Even though it may not give the complete information, we may use the information to get an idea. It may even tell at least that one factor is not important. That itself is a good outcome of doing only a fraction of the experiments. So what we may do is divide the experiment into equal fractions and do the experiments sequentially such that we keep building upon the information from the different fractions. And we may stop at any time once we feel that okay we have enough information or we have enough handle on the process and we can take it up from there. So the characteristic feature of a 2 power n design is the responses given in terms of the main factors, 2 factor interactions, 3 factor interactions and so on. Actually if you look at the information provided by the factorial design, we sometimes or oftentimes may not be able to really make sense of the interactions. What is really a 2 factor interaction mean in a physical sense? For example if you are looking at temperature and pressure, you may superficially believe that temperature and pressure for example T into P. What is the meaning of T into P? T is having units of Kelvin and pressure is having units of Pascal or Newton's per meter squared. How can you combine T into P? Well normally we do coding. So all these factors become dimensionless entities when they take certain values. So by coding we restrict them in the range of minus 1, plus 1 and so on. So they are actually dimensionless and temperature and pressure interaction simply means that the response depends upon a particular factor but that factor variation depends upon the level of the second factor and so on. So we can have 3 factor interactions like this. So when you have such a situation, when you have considerable number of interactions, usually these interactions up to order 2 or 2 factor interactions are significant. Very rarely 3 factor interactions become important and even more rarely 4 factor interactions would be important but I have seen experiments general factorial designs where even 3 factor interactions were important. So we cannot rule anything out and when you do 2 power n design with n being the number of factors and you have a large number of factors then the number of higher order interactions start to increase in a rather alarming manner. So one interim possibility is to carry out the full factorial design and not do repeats. So you carry out the full factorial design and then you say that the higher order interactions are not important and they may be clubbed or combined and may be used to have an idea of the random fluctuation. So this is one way of doing it but as I said there may be instances where even 3 factor interactions may be important and clubbing them into other higher order interactions and calling them as random contribution may lead to wrong conclusions. So a better way is to carry out the fractional factorial design. So how to carry out this fractional factorial design we will see without further ado. In 2 power n design you have in a 2 power n design n main factors and number of 2 factor interactions can be calculated by nc2 and higher order third order interactions may be calculated as nc3 and so on and I hope you know that when ncr is being computed it is n factorial by n-r factorial into r factorial. Following this formula in 2 power 5 design we have 5 main factors 10 2 factor interactions how do you get 10 2 factor interactions it is 5c2 and so when you have 5c2 it is 5 factorial by 3 factorial into 2 factorial. 5 factorial is 120 and n-r 3 factorial is 6 into 2 12. So 120 by 10 is 120 by 12 is 10. 5c3 is also 5c2 and that also leads to 10 3 factor interactions then you have 5 4 factor interactions which is 5c1 and then you have 1 5 factor interaction itself. So when you add up these 15 25 30 31 so you have 31 fx and then you also have the constant beta 0 in the model. So when I look at a 2 power 5 design I can see that there are a huge lot of interactions out of 32 6 are pretty much non-existent or ineffective. So 6 out of 32 is about 20% of them are not really useful and if I even take the 3 factor interactions 16 out of 32 would be 50% 50% of the effects I am determining are possibly not going to affect the response and then I have 5 main factors and 10 2 factor interactions even among the 5 main factors there may be 1 or 2 main factors or 1 or 2 factors which may not influence the response of the experiment. So let us assume that I am doing only a full 2 power 5 design and that would be 32 runs if there are 2 factors which are not really effective then it is actually 2 power 3 design which is 8 runs only are required. So but I am doing 32 runs so essentially I am doing the experiment 4 times. So these are some issues we will have to consider before we commit ourselves to investment of time manpower and money. So since there may be 50% wastage if you may want to put it like that in terms of the number of important factors then we may decide to go on to a fractional factorial design. So how to do it? What we do is we take fraction of the total number of runs that fraction may be 1 by 2 that is 1 half or 1 by 4 which is 1 quarter or 1 by 8 fraction of the overall design depending on the value of n. Obviously when you have 2 power 3 design you can go for a half fraction. When you have 2 power 4 design involving 16 runs then you may consider 1 by 4 fraction or 1 by 2 fraction would be more logical and 1 by 8 you may go for when you have 6 factors that means 2 power 6 is 64 so 1 by 8 would be a fraction involving 8 experiments. So you cannot reduce the number of experiments to an observed value. For 2 power 3 that is 2 power 3 design you may want to do it fully but if you are in a hurry you may first do the 1 by 2 fraction. For 2 power 3 design doing a 1 by 4 fraction is observed you may go for 1 by 4 design for at least a 2 power 4 factorial design so that you have to do 4 experiments. 1 by 8 for 2 power 4 is meaningless that would mean only 2 experiments that is not good you may want to go for 1 by 8 when you have a 2 power 5 or 2 power 6 design. So after deciding the fractions let us see how to calculate the effects. So to put it formally a fractional factorial design may be represented as 2 power n-f we represented the 2 level factorial design as 2 power n and when you start looking at fractions we put as 1 by 2 power f into 2 power n that would mean it is 2 power n-f where n is greater than f and f is of course and a whole number okay. So you can have f as 1, 2 so on to n-1 n also cannot take any real value it can take only whole numbers 1 normally 1 is not used n may be 2, 3, 4 and so on. Again we resort to contrasts to set up the fractional factorial design. When we set up the fractional factorial design we do it with the full knowledge that by analyzing only a fraction of the total number of runs we may be dealing with potential loss of information. What do we mean by that? We are taking all the variables in a fraction and to get all the variable effects uniquely we need a certain minimum number of runs. For example when you are doing a 2 power 2 design we do a minimum of 4 runs. The 4 runs are required to estimate the intercept or beta not and also to find the 2 main effects that makes it 3 and then we also need to find the interaction between the 2 factors and that makes it 4. So we need at least 4 runs to get all the 4 coefficients like beta not coefficient to factor A then the coefficient to factor B and then also the coefficient to factor AB. What are these coefficients? These coefficients are present in the model developed to relate the response to the different factors. So when you are doing only a fraction of the total number of runs involving a certain number of factors then you have to understand that there are more variables than the number of experiments you are doing and this may lead to potential problem called aliasing. Sometimes when people mention in the news about criminals they sometimes tell so and so alias some other name and so on. So the same person is having different names and that is called as alias. What does this got to do with our design? In our design whatever we are finding the effect the effect may in fact represent more than 2 more than 1 factor earlier in our discussion we found the effect and that effect was directed to a particular factor. So we said the main factor A is having this much value main factor B is having this much value interaction between A and B is having this much value. But when you do fractional factorial design once you calculate the effects you do not assign the effects uniquely to a given factor to 1 factor you may assign the effect to more than 1 factor. So what it means is the effect you have obtained may be the contribution from more than 1 factor and we are restrained from calculating the factors uniquely or we are restrained from calculating the effects of factors uniquely because we are doing only less number of runs. So this we have to keep in mind to summarize when we do fractional factorial designs we do calculate the effects in the same way as we did earlier but the effects may no longer be uniquely representing a particular factor it may represent a combination of more than 1 factor. So in our fractional factorial design what we are going to do is to see which are all the factors which are aliased to a particular effect. So it is considered to be important that when the effects are combined or aliased we have to make sure that at least the main factors do not get aliased with each other. So when an effect is getting calculated we do not calculate the contributions of A and B the combined contributions of A and B. A is important B is important. So when you have an effect based on both A and B then the information is not completely coming out from the design. But if you find that the A factor or any main factor is aliased with the 3 factor interactions or higher order interactions then most likely that the effect you have calculated is only because of the main factor. Even though other factors are present the higher order interactions are present they may be making a very small contribution to the effect and so the effect may be attributed to only the main factor. Sometimes in certain designs the 2 factors may be aliased with one another okay then we may have to do the next fraction. It is important not to ignore second order interactions because the second order interactions are very essential. So let us take a small example. We will work with the 2 power 4 design. 2 power 4 would lead to 16 experiments and multiply by 2 to account for repeats and you have 32 runs and your boss may not really like the idea of doing 32 runs in the pilot plant okay. The logical question he will ask is can you do a lesser number of runs? Let us take it from there. Obviously intuitively he is suggesting to you do a subset of the complete number of runs and see what results you get and this is precisely what the fractional factorial designs are about. So we will talk about this in more detail with reference to a 2 power 4-1 design that means we are looking at 1 by 2 of 2 by 4 we are looking at half fraction of a 2 power 4 design and that means we are talking about a fraction involving 16 by 2 runs which is 8 runs okay and the first fraction will involve 8 runs and the second fraction will also involve 8 runs. Of course you may do repeats in the first set of 8 runs to get an idea about the random error. So we want to construct a half fraction of the 2 power 4 factorial design. So how to identify the elements of the fraction? We cannot arbitrarily choose certain combinations and take it as the first fraction there is a systematic way of doing it. Let us see what this method is. What you do is you write the full 2 power 4 design on paper there is no problem with this and look at the highest order interaction. When you are having a full 2 power 4 design the highest interaction is a, b, c, d for 2 power 4 design a, b, c, d the quaternary term is the highest order interaction and when you look at the design matrix you will find it having a combination of minus and pluses and in a 2 power 4 design you will have in each column corresponding to column a or column b or column a, b, b, c, c, d, a, d, a, b, c so on up to a, b, c, d you will find that the columns are containing minuses and pluses and the total number of minuses will equal the total number of pluses. So when you are having 16 possible runs you will have 8 pluses and 8 minuses in each column. So please look at the column corresponding to a, b, c, d that column will also have 8 minuses and 8 pluses. You choose your first fraction according to a, b, c, d the first 8 runs for the first fraction will correspond to the pluses in a, b, c, d column. The a, b, c, d column is having 16 entries, 8 of them are positive or plus 1 and 8 of them are negative or negative 1. Use the 8 pluses in the a, b, c, d column to create your first fraction. So I do not know how many of you followed this, I will just demonstrate it. So we are having this design matrix as before. These are the main factors, binary interaction a, b, main factor c, binary interaction a, c, b, c, binary interaction, a, b, c, ternary interaction and this also continues. You have d, a, d, b, d, a, b, d, c, d, a, c, d, b, c, d and a, b, c, d. So that is done. Then if you look at the first column here these are the 16 possible settings of your 2 power 4 full factorial design. So you have 1, a, b, a, b, c, a, c, b, c, a, b, c, d and so on. And you should not have any difficulty in recognizing these. For example, one corresponds to the lowest setting of all the variables. a means that only factor a is at a plus level. So you can see at only a is at a plus level whereas b is at minus 1. So a, b would be minus 1, c is at minus 1, ac is at minus 1, b, c would be minus 1 into minus 1 that is plus 1, a, b, c would be 1 into b, c which is plus 1. So you can easily create all the entries after you do for a, b, c and d that is the standard order. So you are constructing the columns corresponding to a, b, c, d in a systematic standard manner and once you have a, b, c and d columns you can easily create the other columns a, b, b, c, c, d and so on. Even you can go up to a, b, c, b, c, d and also up to a, b, c, d. So the procedure is very simple and that procedure is also outlined here and then these are also the different corners of the 2 power 4 design okay. For example b, d means that only factors b and d are at the higher settings factors a and c are at the lower setting. Why are these colored in blue and red? I will explain this very shortly. So understand how this entire table was created. First you do for a, b, c and then d which is going to be shown in the next slide. Once you have a, b, c and d then you can calculate everything else by simple multiplication. For this you can use the spreadsheet to carry out the different calculations and produce the final table quickly right. Let us now look at a, b, c, d. a, b, c, d is the highest interaction in this design. Mind you we are doing the full factorial design. So now you have one, one. So all the blues I have colored as one. How many plus ones are there? It should be 8. Let us confirm 1, 2, 3, 4, 5, 6, 7, 8. So you have 8 plus ones and then you also should have 8 minus ones. So whatever plus ones are there I am coloring it by blue and whatever minus ones are there I am coloring it by red and my first fraction would be the entries corresponding to plus 1 in a, b, c, d. What does it really mean? We know that there are 8 pluses and 8 minuses. So I am going to only look at all the plus ones to define my first fraction. How do I do it? So corresponding to a, b, c, d of plus 1, a is at minus 1 and here you can see a, b is also at plus 1 and a, c is 1. So these need not be all plus ones. They can be any value negative or positive but they are corresponding to a, b, c, d of plus ones. So this is important. So I will collect all the plus ones and put them together. That means what? I will be looking at all the settings corresponding to the blue color. Remember the blue color corresponds to plus ones in a, b, c, d. So I will be doing my first fraction at 1, a, b, a, c, b, c, a, d, b, d, c, d, a, b, c, d. That means I will be doing my first fraction with lowest setting of all the factors and a, b, both a and b only at their high values, a, c, both factors a and c at their high values and the remaining 2 factors b and d at low values, a word of caution here. Even though I am doing a half fraction, I am not excluding factor d. Factor d is also present here in the experimentation. Its values are also changed. Its levels are also changed during the experiments but instead of doing 16, I am doing only 8 of the total number of experiments. So I am doing only 8 out of the 16 experiments. So I will be doing at lowest level of a, b, c, d, a and b at high levels and c and d at low levels, a and c at high levels and b and d at low levels, b, c, b and c factors at high levels and a and d factors at low levels, a and d factors at high levels, b and c factors at low levels, a and c factors at high levels and b and d factors at high levels. So I hope I said it correctly. a and c factors at low levels and b and d factors at high levels, c, d means both c and d are at high levels and a and b are at low levels, a, b, c, d means all of them are at high levels. So I am not doing the complete set. I am first doing only 8 experiments 1, 2, 3, 4, 5, 6, 7 and 8. So this way I have done my first fraction. The second fraction which I may decide to either do or omit are corresponding to the red entries a, b, c, a, b, c, d, a, b, d, a, c, d, b, c, d. So those would be the settings corresponding to my second fraction. So you can see that a, b, c, all the items highlighted in red colour correspond to minus 1 entries in the a, b, c, d column. So minus 1 entries in the a, b, c, d column and they correspond to the second fraction. The first fraction we have taken based on plus 1s are is termed as the principal fraction. So then you may ask what are all these terms doing? I have entry for a, I have entry for d. I am just multiplying the entry in a with entry in d to get the a, d interaction. These are used to define the table of contrasts and from the table of contrasts I can calculate the effects. Ideally speaking for a 2 power 4 design, I would for a 2 power 4 design I would need to calculate the 16 entries. This we have seen previously in the discussion on factorial designs, 2 level factorial designs. I would be looking at the entries in each of the column and from this contrast I would be finding out the effect of a. So my effect of a would have 16 entries minus 1 plus a minus b plus a, b and so on. But you would be correctly asking look how can you use all the 16 entries to find the effect of factor a because you have done only 8 experiments and so in order to calculate factor a for the table of contrasts I will be constrained with the only 8 available entries. That means I am not able to find a uniquely. So I am not able to find a fully. This is a problem. Similarly I same applies for any other interaction or main factor. For example look at a, b, c. In the usual case I would have had to go for the entire set of 16 entries to calculate the effect of a, b, c. But now I can do only with 8 entries corresponding to the blue color. So I would have to use this minus 1 then I would have to go for this minus 1 then these 2 minus 1. So only the 8 values may be used. So I am not able to get the complete information. So I hope you have followed the discussion so far. It is pretty simple actually. It is just that you are having lot of entries and you have to keep track properly and not make any mistake by putting a plus 1 instead of minus 1 and so on and that would lead to complete confusion. One quick way to check is to see whether the number of pluses are equal to the number of minuses. So what are the blue entries? They are 1, a, b, a, c, b, c, a, d, b, d, c, d and a, b, c, d. 1, a, b, a, c, b, c, a, d, b, d, c, d, a, b, c, d. b, d and c, d are there yeah, b, d and c, d are there, a, d, b, c, a, c, a, d, b, c, a, c. So that is correct. And the remaining settings will constitute the second fraction. So these are the ones in blue colour and these are the ones in red colour. To make it clear let me put the colour right now. So these are the ones in blue colour and these are the ones in red colour. So then full screen. So here you go. So this is the first fraction and this is the second fraction. Now we introduce an important concept called as the design generator and the design generator is i is equal to a, b, c, d and we use this to set up the 2 fractions. Remember that we used a, b, c, d, we used a, b, c, d to set up the 2 fractions and this a, b, c, d is called as the design generator. Now using this we can identify the aliases very easily. The design generator is given by i is equal to a, b, c, d. So a may be written as a squared b, c, d. So a into a will lead to all plus 1s. So that vanishes and so a is aliased with b, c, d and b is aliased with a, c, d because you put b here it becomes a, b squared, c, d that becomes a, c, d because b will then become all, b squared will then become all plus 1s and so b is aliased with only a, c, d. Similarly c is aliased with the non-c terms a, b, d and d is aliased with a, b, c. So you can say that the main factor is aliased with the 3 factor interactions. What happens to the 2 factor interactions? We see that the 2 factor interactions are aliased with other 2 factor interactions. a, b, I put a, b here, a squared, b squared that becomes 1 because a squared is a column of all plus values and b squared is a column of all plus values. So they really do not contribute to the sign taken by a, b. So you have a, b, you have a squared b squared, a squared b squared will be all plus 1s. So that really does not affect the sign. It is only determined by c, d. So a, b is equal to c, d. That means a, b is aliased with c, d. On the same lines we can easily show that a, c is aliased with b, d and a, d is aliased with b, c. So we can say that the single factors are aliased with 3 factor interactions and 2 factor interactions are aliased with other 2 factor interactions. So what is this aliasing? What do you really mean by a is equal to b, c, d? And b is equal to a, c, d. What does it mean? So in the next part we will be seeing that a is indeed aliased with b, c, d. In other words you cannot distinguish between a and b, c, d. And again the 2 factor interactions are aliased with other 2 factor interactions. So that is what we will be seeing and then we will see how to calculate the effects. So we will take a small break now and continue shortly.