 Hello, in this lecture 28, we will be looking at concepts pertaining to orthogonal models. The references are given in the current slide and the slide to follow. The first book is by Box and Draper, Response Surfaces, Mixtures and Ridge Analysis. It is a slightly advanced book for people who want to deepen their understanding and knowledge on the subject may refer to this book. The other book where this slide material is mainly based upon is the one written by Meyers, Montgomery, Anderson Cook. The title of the book is Response Surface Methodology, Process and Product Optimization Using Designed Experiments. Third edition, John Wiley and Sons, published in 2009. This is an excellent book where the concepts are explained in a very clear manner. It is not very mathematically rigorous so people with basic knowledge in linear algebra should be able to follow the material given in this book quite easily. And of course the other book is the book written by Montgomery, Design and Analysis of Experiments. Seventh edition, John Wiley and Sons, New York, 2010. Other few references are by Kutner, Nashim and Netter, Applied Regression Models and Draper and Smith, Applied Regression Analysis. So there are quite a few interesting books written on this subject. Now let us come to orthogonal designs. What is an orthogonal design? An orthogonal design involves matrix X and it comprises of vectors that relate directly to the model factors. So if your model is having the contribution from factor A, factor B, interaction AB, A squared, B squared then the model will look something like this. It will have 1 X1, X2, X1, X2, X1 squared, X2 squared. These are all column vectors. So when you enter the model in the matrix form, the first column would be the matrix of ones. The second column would be the values corresponding to the settings of factor X1, similarly the settings of factor X2 and then settings corresponding to X1, X2, then X1 squared, X2 squared, etc. depending upon the complexity and length of the model. So here 1 refers to the vectors of ones and Xi refers to the single factors, Xi Xj refers to the interaction between factors and Xi squared refers to the higher order quadratic terms that more completely account for the curvature. Now if 2 columns of the design matrix are orthogonal, it implies the levels of these 2 variables are linearly independent. The important implication of linear independence is that the contributions of these 2 variables to the process response may be evaluated independent of one another. If the interaction column is also linearly independent of the rest, then its share may also be evaluated independently. So this is a very useful concept in regression analysis and design of experiments. In planned design of experiments, the way the experimental points are set, it makes the design orthogonal in nature. So we can say that the factor A contributes to the model independent of how factor B is contributing to the model. Similarly even if you have interaction AB in an orthogonal design, AB contributes to the model independent of how A and B contribute to the model. Because the column vectors in the X matrix are linearly independent of one another. So this is a very beautiful concept. What I am trying to say here is suppose you have a model and in the model you say that the Y hat or Y predicted is equal to beta not hat plus beta 1 hat X1 plus beta 2 hat X2. Then you will find the coefficients beta 1 hat and beta 2 hat. Suppose you for some reason decide to omit the factor A altogether in the orthogonal design. Then your model will be beta hat not plus beta hat 2 X2. The value of beta hat 2 will be same as it was in the previous full model. So let me go to the board and explain what I mean by that. Suppose you have an existing model okay. So this is the existing model and you want to try a new model where you put only beta hat not plus beta hat 2 X2. You do not have the contribution from beta hat 1 X1. Then you will find that the parameters beta hat 2 in this old model is equal to the parameter beta hat 2 in the new model. This is because of the orthogonal design. So the contribution of factor X1 and the contribution of factor X2 are evaluated independent of each other. Their contribution is independent of each other to the observed response. This is only true in the case of orthogonal design. So you can see the advantages of this. We have already seen examples of the orthogonal design in our example set on regression analysis. So let us look at the X matrix and you can see that this is 2 power 2 design. This is the column vector of 1's. This would be the settings corresponding to factor 1 – 1 – 1 1 1. This would be the settings corresponding to factor 2 – 1 – 1 – 1 – 1 and you will have X1 X2 and that would be product of these 2 columns – 1 x – 1 would be – 1 – 1 x 1 would be – 1. 1 x – 1 is – 1. 1 x 1 is – 1. X1 squared is obtained by squaring these terms in this column. So we are having essentially 1, 2, 3, 4 columns and we also have the additional column shown in red X1 squared and X2 squared. Let us now develop the model. Y hat is equal to beta hat 0 plus beta hat 1 X1 plus beta hat 2 X2 plus beta hat 1 2 X2 plus beta hat 1 2 X1 X2. You can see that the green colored model is having 1, 2, 3, 4 parameters and that would be the maximum number of parameters which can be estimated from this design because you are having only 4 independent settings and so the maximum number of parameters would be 4. But if you have a greedy model and you also try to evaluate the additional terms like beta hat 1, 1 and beta hat 2, 2 then you run into trouble because you do not have 6 independent settings to obtain these 6 parameters. Now let us look at the X matrix. This is the X matrix again exactly same as the X matrix corresponding to the green colored vectors. So the rank of this X matrix is equal to 4 and X prime X would be taking transpose of this X matrix and multiplying the transpose with the X matrix again, you will get a diagonal matrix having 4 along the diagonals and when you look at the matrix which is made up of the entries corresponding to X1 squared and X2 squared then you have in addition to the old X matrix also these 2 additional columns of 1s and of course you can see that this X1 squared is exactly same as the vectors of 1s, X2 squares also replica of the vector of 1s. Now when you look at this augmented matrix X, the rank of X is still 4 and when you look at the X prime X you get a matrix as shown here and the problem is we run into trouble when we take the inverse for this particular matrix. So you have X prime X inverse for this case as 1 by 4, 1 by 4, 1 by 4, 1 by 4 along the diagonals. There is no problem in evaluating this particular matrix but if you try to take the inverse of this X prime X matrix as given here then you will find that it is not defined. So what I am trying to say here is do not try to expand the scope of your model when the number of independent settings in the model is limited. In the current case of a 2 power 2 design we had only 4 independent settings and so we could estimate only 4 independent parameters and the moment we try to increase the scope of the model by adding the quadratic terms then we found that we cannot estimate the parameters because the X prime X inverse matrix was not defined in such a situation. So what is an orthogonal design? If you take the transpose of any vector and then multiply that transpose with another vector a different one. Because I am taking the ith column vector take the transpose of the ith column vector then I pre-multiply the transpose of the ith column vector against the column vector Xj, i is not equal to j I will get 0. So if you take this particular case if I take the transpose of this it will be 1 row by 4 column vector a transpose of a vector and that would be 1, 1, 1, 1 horizontally and then I multiply this with the column vector here – 1, – 1, 1, 1 I will find that it will be – 1, – 1, – 2, plus 1, plus 1, plus 2 so – 2, plus 2 will become 0. The same concept applies for any binary combinations of vectors one being a transpose and another being the regular column vector if I multiply that 2 I will get 0. So this is the property of the orthogonal design the important precaution you should take is that you should not take i is equal to j then the negative and positive elements would be squared and all of them would be positive and the sum will not be equal to 0. Now let us look at first order orthogonal designs let us take a 2 power k factorial design where k is the number of factors keeping the levels at the extremes of plus or minus 1 enables the variance associated with the predicted model coefficients at a minimum. So this is another advantage of factorial design we are keeping the design settings at the extremes so – 1, plus 1, – 1, – 1, plus 1, – 1, plus 1, plus 1. So when you are having a design space the experimental settings are kept at the extreme ends and this helps to minimize the variance of the estimated parameters. So the question that arises naturally is suppose you have an orthogonal design how do you go about estimating the parameters well you can do it in 2 ways one you can use the regular design of experiments approach and estimate the effects and then the model coefficients or you can use the matrix approach in linear regression. So as I said earlier in an orthogonal design with the levels set at plus or minus 1 the variance of the model parameters variance of beta hat i by sigma squared is minimized. Here i is equal to 1, 2 so on to k and k is the number of factors here obviously you are not including that intercept beta hat not and if you can include that also you will have k plus 1 which is equal to p number of parameters. The variance of the predicted coefficient is given by variance of beta hat i and that you scale it by sigma squared you will get the variance covariance matrix x prime x inverse. So this x prime x and the x prime x inverse matrices are very very important and for an orthogonal design we know that the x prime x inverse is a diagonal matrix with the off diagonal elements are 0 and setting the x vectors at the extremes minimizes the estimated parameter variance. So we have already seen that for an orthogonal design such as the one we considered previously this is an orthogonal design 2 power 2 design and you can see that the x prime x matrix is a diagonal matrix and the x prime x inverse matrix is also a diagonal matrix at 1 by 4, 1 by 4, 1 by 4, 1 by 4. So this arrangement minimizes the variance of the estimated regression coefficients. Now let us look at the implications of different models. So the first model we are considering on the left hand side is a half fraction of a 2 power 3-1 design that means we should be having 4 runs only but it is clear that we are having 8 runs here. The reason for it is quite simple we are repeating the 4 experiments of a 2 power 3-1 design. So you can see that the first 2 rows are identical, the third and fourth rows are identical, the fifth and sixth rows are identical and the seventh and eighth rows are identical. That means each experimental setting in the 2 power 3-1 design is repeated. You also look at the regular 2 power 3 full factorial model. Here you have all the possible settings for a 2 power 3 factorial design and the main thing to notice there are no repeats. On the other hand for a 2 power 3-1 design you had repeats but on the other hand you did not have the full set. You did not have the full set of independent settings possible. So when you look at the 2 power 3 full design it would lead to an X prime X inverse matrix containing diagonal elements of 1 by 8 which is equal to 0.125. So you are having an X matrix corresponding to a full 2 power 3 design. If I take X prime X inverse matrix which is a straightforward thing to do by now I will get if I think 1 by 8, 1 by 8, 1 by 8, 1 by 8 along the diagonals and off diagonal terms would not be present. But what is the advantage of a 2 power 3-1 design? Why do we have to go for a 2 power 3-1 design? Maybe performing a 2 power 3 full design was expensive. So the experiment decided to go for a 2 power 3-1 design. On the other hand he has still conducted 8 experiments the same number as a full 2 power 3 design. So the experiment was more focusing on the estimation of the pure error. He probably had some insight into the model based on previous experience and so instead of wasting time and resources on doing the complete set he probably wanted to get more information on the pure error term. So a 2 power 3-1 design strategy will unfortunately not enable you to find the full set of parameters that are possible from a full 2 power 3 design. 2 power 3 design has 8 independent settings and so you should be theoretically able to estimate 8 parameters of the model. So you can build your model up to 8 parameters. However for a 2 power 3-1 design you are having only 4 independent settings and so you will be able to estimate only 4 parameters. So you are losing the ability to estimate 4 parameters and so your 2 power 3-1 design strategy will not have any degree of freedom for lack of fit if you estimate the 4 parameters. But on the other hand if you look at the 2 power 3 design and you are estimating only 4 or 5 parameters then you have sufficient degrees of freedom to test your model for lack of fit. So when you look at the 2 power 3-1 design where each design setting is repeated you do not do the full design you do only half the complete design but you do repeats. So in this if you are estimating 4 parameters since there are only 4 independent settings you do not have any degrees of freedom for testing the lack of fit. So in the full 2 power 3 design in addition to the intercept coefficient namely beta hat 0, 7 other coefficients may be detected. These are the coefficients corresponding to the main factors beta hat 1, beta hat 2, beta hat 3 and then you can also look at the interactions between the 2 factors beta hat ij, i is not equal to j and then the ternary interaction term beta hat 1, 2, 3. So you can estimate the main effects and you can estimate the binary interactions and you can also estimate the ternary interaction between the factors. So these are the 8 coefficients including beta hat 0 and if you are fitting a model with only the main factors that means out of the 8 possible parameters you are estimating only 4 beta hat 0 and then the regression coefficient corresponding to factor A, regression coefficient corresponding to factor B, regression coefficient corresponding to factor C that would make it as only 4 parameters and in such a case the ANOVA table indicates 4 degrees of freedom for lack of fit. So 8-4 is 4, 8 parameters are maximum possible but you have estimated only 4. So you have 8-4 is equal to 4 degrees of freedom for testing the lack of fit. But what is the drawback in the full 2 power 3 model? You have exhausted all the settings and all independent settings have been exhausted and that consumed all the full 8 runs. So you are not in a position to repeat your experimental settings. Suppose your management says that fine you can do a maximum of 8 experiments. So one group proposes 2 power 3-1 design with repeats on the other hand there is another group which goes in for a full 2 power 3 model. On one hand the group which proposes 2 power 3-1 design will have an estimate of the pure error but it will not be able to estimate all the interactions including the ternary interaction whereas the group which went in for a 2 power 3 design will not be able to get an idea on the pure error and on the other hand it would be able to estimate as many as 8 parameters. So which model is good? That depends upon the process and the prior knowledge on the process you have. So when you look at the 2 approaches they have the same variance of beta hat i by sigma squared because the x prime x inverse matrix is same in both the cases and that is actually 1 by 8. So the 2 power 3 and the 2 power 3-1 design with 4 repeats have different scopes of application. The latter is used the 2 power 3-1 design is used if you have prior knowledge or experience that the lack of interaction term effects are known beforehand. So you know previously that there is no interaction between the factors of your model. So the interaction terms are neglected and if the binary interaction terms are neglected then there is even very little chance that the ternary interaction would kick in. So let us expand on this topic a bit more. The query is why cannot you use the fractional design for detecting higher order terms. So the main problem is there is aliasing between single and 2 factors and the x prime x inverse matrix becomes singular as some of the columns are not linearly independent. So let us look at the 2 power 3-1 design. So you have the ones, you have the settings corresponding to factor 1, settings corresponding to factor 2 and settings corresponding to factor 3. So obviously the runs are repeated. So you are doing 8 runs but you have only 4 independent settings. Now I can estimate only the intercept beta hat 0, beta hat 1, beta hat 2 and beta hat 3. Suppose I am trying to estimate the interaction also. If I do x1, x2 to bring in that interaction effect into the model, unfortunately this x1, x2 column vector will be exactly identical to the x3 column vector. Similarly x2, x3 column vector which is the second binary interaction would be identical with x1 column vector and you can figure out that x1, x3 column vector would be identical with x2 column vector and x1, x2, x3 is aliasing with the column of ones. So this x matrix made up of all these elements is definitely going to have a lower rank and the inverse of the x prime x matrix will lead to difficulties. It cannot be estimated because the columns in the design are not linearly independent of each other. Now let us look at the orthogonal design involving an experimental scheme with as many as 5 factors. So let us construct an orthogonal design involving one half fraction of a 2 power 5 factorial design. The full factorial design will involve 2 power 5 which is equal to 32 runs. Obviously this is too many. So we want to restrict the number of runs. So we go in for 2 power 5-1 factorial design which is one half fraction of a full 2 power 5 design and so we have 16 runs. So what are the terms in the model that may be fitted using the given design? When we consider the 2 power 5 design, we have 1 intercept, 5 first order coefficients, 5 main factors, 10 binary interactions and then you have again 10 ternary interactions and 5 quaternary interactions and 1 5 factor interactions. So this is the complete set in a model where we consider only the main effects and the interaction between the factors. So that would be a total of 1 plus 5 6, 6 plus 10 16, 16 plus 10 26, 26 plus 5 31 plus 1 32. So in a full 2 power 5 design involving 32 independent settings, you would have been in a position to estimate the constant beta hat 0, the 5 main factors, 10 binary interactions, 10 ternary interactions, 5 quaternary interactions and 1 5 factor interaction. But we are having only 16 independent settings and so if we are going sequentially in the model development, we have possible the estimation of 1 intercept, 5 first order coefficients that makes it 6 and 10 binary interactions that makes it 16. Beyond it, we cannot estimate any more parameters. So when you have such a case 2 power 5-1 design and you have estimated the intercept, the main factors and the binary interaction, what would be the lack of fit degrees of freedom? Since you have exhausted all the 16 independent settings to estimate the parameters associated with these terms, the lack of fit degrees of freedom would be 0. And the next query is quite simple, show that the design is indeed orthogonal not only with respect to the main factors but also with the associated interactions. This is a very straightforward thing to do. You can write down the X matrix, you can first write the vector of 1s corresponding to the 16 experiments and then you can write the half fraction design. So you write the full factorial for a 2 power 4 design and then we know from our concepts of fractional factorial design how to accommodate for the 5th factor. Let us say the 5th factor is factor E, then to set up the column of factor E, we have to use the design generator I is equal to A, B, C, D, E or E is equal to A, B, C, D. I request you to refresh these concepts and you can write down the X matrix corresponding to a 2 power 5-1 fractional factorial design. And once you have written down this matrix and you have also included the possible binary interactions in the X matrix, you can see that they are comprising of number of columns. These columns are having a set of minus 1 and plus 1 values except the first column which is the matrix of 1s which will have all elements as 1, the other columns would be having a mixture of plus 1s and minus 1s. And if you take any 2 columns, you take the transpose of the first column and then you multiply that with the other column vector, you will get it as 0. So any inner product of any 2 column vectors that are not from identical columns is indeed 0 that can be easily shown. And is there any aliasing in this design? No, all the columns are independent of each other and hence the experimental levels may be varied independent of each other. There is no aliasing in this 2 power 5-1 design as long as you restrict yourself to the constant, the main factors and the binary interactions. The moment you go for ternary interactions or quaternary interactions, then there would be aliasing. So up to this model, you are not going to have the danger of aliasing because you can estimate the parameters from different experimental settings. And how many residual degrees of freedom does this design have? If you have utilized all the 16 independent settings to find 16 model parameters, the residual degrees of freedom would be 0. The lack of repeats also ensures that the degrees of freedom for pure error is also equal to 0 and you are also unable to test for lack of fit. So the lack of fit degrees of freedom is also 0. The residual degrees of freedom is the sum of the pure error and lack of fit degrees of freedom. There are no lack of fit degrees of freedom as was seen previously. There are also no repeats in this design and hence pure error degrees of freedom is 0. Hence there are no residual degrees of freedom and the design illustrated is said to be saturated. What would be the R squared for this saturated design? R squared value would be equal to 100% because you are using all the independent settings to estimate a corresponding number of the model parameters. So you will be able to achieve a perfect fit to your model and the R squared value would be equal to 1 or expressed in percentage, it would be equal to 100%. Well, this is misleading because your model has now got as many as 16 parameters. These 16 parameters are difficult to work with, unwieldy and when you try it for a different setting, they may give completely different predictions. So there are a lot of issues with 16 parameters. Normally a model, even an empirical one or a linear regression model would be having not more than 4 or 5 parameters at the most. Now let us talk about center runs. So what are center runs and why are they required? Center runs are either single or multiple repeats of the experiment or the geometric center of the experimental design in the coded format. Suppose we have an experimental design which is coded in terms of –1, plus 1 and so on. We look at the geometric center of such a design and that would be at 0, 0, 0 corresponding to the midpoints of the independent factors. So center runs are very important to the experimental design. You have a factorial design and you add center points, you are enhancing or augmenting the factorial design. The addition of center runs to an orthogonal design does not alter the orthogonal property as they simply comprise of 0s in the coded format. So if you have an orthogonal design and you add center points to it, it does not disturb the orthogonality and the design including the center points will unfortunately make the variance of the parameters slightly higher. So it is no longer a variance optimal design as the design points are no longer restricted to the extreme corners of the experimental space. The previous factorial designs without center points, all the experimental settings were at plus 1, minus 1 combinations and since these are located at the very edge of the boundaries, the variance of the estimated parameters was minimum and so it was a variance optimal design. But the moment you have center points, you are having some design conditions or experimental conditions at the center also. They are no longer at the extreme ends. So this makes the design a non-variance optimal one. So if that is so, why do we need center runs? What help or what additional benefit do they bring about? Before we get into that, let us understand about the center runs a bit more. The center runs do not contribute to the linear effects and the interaction. They do not contribute to the main effects and the interaction. Suppose you have a factorial design without center runs and you estimate the main effects and the interactions, then you include the center runs in this orthogonal design, the main effects and the interactions values and the corresponding coefficients will not be altered. Suppose you have a design with the center runs, you have another design without center runs, both of them will predict the same value of the main factor effects and the interaction effects. So it does not really matter whether you have center runs or not as far as the estimation of these coefficients are concerned. So center runs are repeated runs. So they help you to get an idea about the experimental error which is very important. Only when you know the extent of experimental error would you be able to comment upon the significance and relevance of the factors in the experiment. And depending upon the significance or relevance of the factors in the experiment, the corresponding coefficients will appear in the model developed. And so in addition to the estimation of the pure experimental error, the center runs are also helpful for detecting whether curvature effects are important or not. The center runs cannot explicitly bring in the contribution due to curvature but it can only indicate whether the curvature effects are important or not important. So we will take a small break here.