 Hello, welcome back. Today we will be looking at the optimal designs. The reference for this lecture material is based on the book written by Meyers Montgomery Cook. The title of the book is Response Surface Methodology, Process and Product Optimization Using Design and Experiments, 3rd edition, John Wiley and Sons, New York 2009. So we have looked at regular factorial designs and then we moved on to second order designs where we talked about the central composite design and the box benken design. These are very popular design among the practitioners of this method. So the criteria for a good statistically designed experiment are listed in the following slides. We use a statistical design strategy to develop a mathematical model eventually based upon the significant factors in the process and that should have a good fit to the data, experimental data of course, high value of R squared, adjusted R squares, a narrow confidence intervals, etc. The model should also have some degrees of freedom for lack of fit so that it can be expanded to include higher order terms. And it should also be amenable to building the model sequentially starting with the simplest model first and then gradually adding more factors or more interactions between factors or higher order terms involving the factors so that we can see the benefit of increasing model complexity. At some point we can say that okay we are not getting any further improvement from the model so we will stop at this particular point. For example, if the adjusted R squared is beginning to rise up to a certain model expansion and begins to fall then there is no point in adding further parameters and also it should be having repeat points especially at the center of the design so that the pure error may be estimated and it should also be robust by being insensitive to the presence of outliers. The presence of outliers should be clearly seen and their presence should not alter the model structure drastically. It should also be cost effective and design involving less number of runs would be more attractive in that sense. And we have also seen this it should provide a good distribution of scaled prediction variance. We saw that model should be able to predict well within the design space and since the model is built on experimental data and there is uncertainty associated with the experimental data because of random fluctuations and random errors and so on. There is also uncertainty associated with the model prediction and the variability in the model prediction should not be too much in the design space. So the box Benken design and the central composite design are useful as they involve sufficient number of runs to test for lack of fit while avoiding unnecessary degrees of freedom and experimental expense and repeats are also carried out at the design center. When the experimentalist has to go for more economical runs there are other versions available which are discussed for example in the reference book by Montgomery I just referred to. So those books are having further details on economical designs. We will not be going into these economical designs for want of time and it is also not within the scope of the current subject, right. So first we will define a moment matrix we have already seen this earlier. Moment matrix M is defined as the ratio of x prime x to n where x matrix is the experimental design matrix containing the column of once x1, x2 and so on and n is the size of the experimental run. So from the definition for M which is x prime x by n we can easily show that the M inverse would be given by x prime x inverse into n. We are adding n to account for the size of the run, okay. Now this matrix M inverse matrix is called as the scaled dispersion matrix. We are scaling it by n. We are scaling x prime x inverse by n so that the size of the run gets cancelled out with the elements of x prime x inverse. Suppose you have first order design, orthogonal design with 8 runs for a 2 power 3 factorial design, x prime x inverse would have 1 by 8 along the diagonal and then you are multiplying it by 8 and reducing it into an identity matrix. So whether it is 16 runs or 8 runs that number of runs is removed from the analysis. This x prime x inverse matrix is a very important one because it contains the information on the variances and the covariances of the estimated parameters beta hat. We use the regression concept to find the regression coefficients beta hat 0, beta hat 1 so on to beta k, beta hat k. So we found k plus 1 parameters from the regression exercise and these regression parameters also have variances and covariances associated with them. For example for each regression parameter would have the variance associated with it and you will also have covariances between different parameters. And to get these variances and covariances we make use of the variance covariances matrix which is nothing but x prime x inverse and we are multiplying by n and dividing it by sigma squared to make the resulting structure or matrix independent of the size of the run and independent of the unknown variance also. So we are able to have a uniform, we have a uniform basis for comparison between different runs. The variance covariances matrix is actually given by x prime x inverse sigma squared and what we do is we divide it by sigma squared so that the sigma squared vanishes and you still have x prime x inverse. As I told you the x prime x for an orthogonal design will take values along the main diagonal corresponding to the number of runs in the design. And when we take inverse of that we will take 1 by n along the diagonal main diagonal terms. So when n increases it will appear as if the variances of the parameters are less. By choosing a large number of runs I can claim that the variance of the parameters are reduced whereas a more economical efficient design involving less number of runs may seem to have a high variance of the regression parameters. In such cases it is important to put them on a uniform basis and so we multiply the x prime x inverse matrix with the size of the run n. So now that we have the x prime x matrix and we know that it is playing a central role in the variance of the estimated coefficients we can see how we can exploit the structure of x prime x such that we get low regression coefficient variances so that we are estimating the regression coefficients more precisely or with less degree of uncertainty. So we have to locate the factorial points or the experimental design points in the x matrix in such a way that the x prime x inverse is reduced. So when we take the determinant of the moment matrix it can be shown that the determinant of the moment matrix is nothing but the determinant of x prime x divided by n power p where p is the total number of parameters including the intercept beta hat not. So again I repeat we are finding the determinant of the moment matrix we defined the moment matrix as x prime x by n and when we take the determinant of this particular m we end up getting determinant of x prime x divided by n power p because we are dividing it by n by property of determinants we get a matrix divided by n becoming the determinant divided by n power p. And also it can be seen that if the determinant m is large it implies that the volume of the confidence region is small okay the advantage of having a determinant is you can get one single value and that can be used as a criterion for evaluating different designs. So when the determinant m is large it also means that x prime x would be large the determinant of x prime x would be large and but the same token you can also feel that determinant of x prime x inverse would be small. So if we can imagine linking the confidence region we are constructing in the parameter space the n dimensional space would comprise of n estimated parameters. Suppose I am estimating 3 parameters I would have a 3 dimensional space if I am estimating 4 parameters I would have a 4 dimensional parameter space and we can imagine a confidence volume in this multidimensional space and it is not good if this confidence volume is large it means that the models regression coefficients may take values between one number and another number and the 2 numbers are widely separated apart that means the confidence levels for each parameter is 95% let us say then the upper and lower limits of the confidence intervals would be quite far apart from each other. So this would make the volume of the confidence region pretty large and it will also indicate that the parameters are not estimated precisely. So our aim is to make the volume of the confidence region quite small and in other words we have to also then make the determinant of x prime x quite large. If the inverse of the confidence region is quite high it means the confidence region is quite small and for having a small volume of the confidence region it would be good if we have a large determinant value for x prime x. So if x prime x is large it means that x prime x inverse is small and hence the variances and covariances of the regression parameters is small this is obviously desirable. So we are focusing a lot of attention on the variance covariance matrix. The variance of the regression parameters are given by the diagonal elements of the variance covariance matrix and the variance covariance matrix is nothing but x prime x inverse sigma squared. So we are looking at ways and means through which the diagonal terms the variances of the estimated parameters may be made as small as possible and that would be small if the x prime x inverse is small. So now we are looking at certain alphabetical criteria based designs statistical designs the first one is the D optimal design it is an alphabetical criteria based design because we are using the letter D to denote it as the optimal condition. So here you have determinant of m is equal to determinant of x prime x divided by n power p and this may be maximized if this is maximized then obviously the determinant of x prime x inverse would be minimized and that would also minimize the volume of the confidence region in the multidimensional space and our parameters would be estimated more precisely. So we can express this criterion as maximum of zeta of determinant of m of zeta. So we are having several designs and we are choosing that design as D optimal which will maximize the determinant of m for a particular design. So we can define a D efficiency of a design as the determinant of the moment matrix for a particular design under consideration divided by the maximum value that may be taken among all possible designs and this ratio we are scaling by a power of 1 by p. So this is called as D efficiency it may look a bit complicated and highly mathematical on the other hand it is very simple all you are finding is the determinant of the moment matrix and we also define the moment matrix as x prime x by n. So we have defined m earlier as x prime x by n and we are simply taking the determinant of this m matrix for a particular design under consideration then among all possible designs we are trying to find that design which will maximize the determinant value of m and we are also scaling this entire ratio by 1 by p where p is the total number of parameters and so now that we have defined the D optimality we may use this criterion to compare between different designs and it is also a useful measure of the quality of the estimated model parameters. So the next criterion is the variance optimal design and for illustration purposes we will consider a simple first order orthogonal design and a first order orthogonal design is one in which the x prime x matrix is of the diagonal type and this I think you should know by now you take a 2 power 2 regular factorial design and then you set up the x matrix and then when you compute x prime x you will get a diagonal matrix and we can say that the columns of x are mutually orthogonal for a first order factorial design and so when I take the transpose of 1 column vector and multiply that with a column of with another column vector I should get 0 as the sum because in an orthogonal design you have equal number of positives and negatives and when I multiply one column with respect to another column the net answer should be 0 this is assured in orthogonal designs. We also know that in first order model studies an orthogonal design is such that the variables are located at extremes we have coded the variable levels as lying between minus 1 and plus 1 the lowest limit of the variable is called as minus 1 and the upper limit of the variable is called as plus 1 and the experimental points are kept at the extremes they are located at minus 1 and plus 1 since these values are located at extremes the x prime x inverse matrix would be quite favorable to us it will be quite small. Let us see that we are considering a first order model and run size of n capital N let xj values be defined such that they are falling between minus 1 and plus 1 and for parameters j is equal to 1 2 so on to k we can find the variance of beta hat i by sigma squared from the variance covariance matrix nothing but x prime x inverse matrix and in such cases the variance beta hat by sigma squared is minimized if the design is orthogonal and all the xi values in design are placed at plus or minus 1 for i is equal to 1 2 so on to k so what I am saying is this variance of beta hat prime by sigma squared is lowered or minimized if the experimental design is orthogonal in nature and all the xi levels are located either at minus 1 or at plus 1 so the elements of the diagonal of x prime x inverse are minimized as diagonal terms are made as large as possible and further the off diagonal terms become 0 in orthogonal design the off diagonal terms are 0 so that saves a lot of headache and at the variance of the estimated parameters decrease when x prime x inverse is minimized. So now let us look at variance optimal first order designs so we are now looking at two level factorial plans and fractions of resolution 3 and higher do in fact minimize the scaled variances of all the coefficients variances divided by sigma squared are minimized in designs of resolution 3 and higher. So now let us take an x matrix coming from a 2 power 4 minus 1 design we are looking at a fractional factorial design half fraction design 2 power 4 minus 1 that means instead of doing 2 power 4 experiments we are doing only 2 power 3 experiments so we are doing only one fraction one half fraction of 8 runs when there are totally 16 runs possible and in a 2 power 4 minus 1 design you can still estimate 4 main parameters. The x prime x inverse matrix in this case would be 5 by 8 into 8 by 5 or in other words it would be a matrix of size 5 by 5 and the value of the x prime x matrix is 8i 5 and so when you are looking at design of 5 by 8 so or 8 by 5 that means you are having the x matrix of dimensions 8 by 5 so you are having 8 runs the 8 runs may be because of 2 power 3 full design or at 2 power 4 minus 1 fractional factorial design so that is how you get 8 and how come there are 5 columns the 5 columns may be in the 2 power 3 factorial design the intercept factor 1, factor 2 and factor 3 so that and then you also have the interaction between the factors. So let me just illustrate that also when I am talking about a x matrix of 8 by 5 8 rows and 5 columns this may be number of runs and that is equal to 8 and 5 represents number of parameters what could be the parameters in the 2 power 3 design they can be beta hat 0, beta hat 1, beta hat 2, beta hat 3 so this makes it 4 parameters and then you can have one more parameter let us say beta hat 1, 2, 3 this is just an example if you are having a 2 power 4 minus 1 design this is what we are looking at if you are having 8 by 5 it means we are doing only 8 runs and this 5 parameters will simply correspond to beta hat 0, beta hat 1, beta hat 2, beta hat 3, beta hat 4. So the 5 columns will comprise of the vectors corresponding to the estimation of the 4 parameters here beta hat 1, beta hat 2, beta hat 3, beta hat 4 in addition to the beta hat 0. So this is what we have in such kind of designs so this 5 represents the number of parameters we are estimating. Now that is clear we can find the value of the x prime x matrix in such a situation involving 8 runs and 5 columns as 8 i5, 8 multiplying an identity matrix of order 5 that is what I have said here. Here we have a diagonal matrix of size 5 by 5 and the diagonal elements take the value of 8 and what would then be x prime x inverse that would be 1 by 8 into i5. The inverse of 8 is equal to 1 by 8 and the inverse of an identity matrix is the identity matrix itself of the same order. We can also even make a tall claim that there is no other design with 8 experimental runs that can produce variances of the estimated parameters smaller than sigma squared by 8. This is an important result. So now I have a slide which shows what I have been talking so far. You have the x matrix and let me just go back a few slides. We are talking about a 2 power 4-1 design. The slide I am going to show next is based on a half fraction of a 2 power 4 factorial design or a 1 by 2 of a 2 power 4 design or a 2 power 4-1 design. Obviously this would require 8 runs and so these are the 8 rows in the x matrix and you can see that we are having a 2 power 3 factorial design here-1-1-1, 1 1-1, 1-1, 1-1-1 and so on. And the last column is obtained by multiplying the 3 columns that I think we know why and how. So please see the lecture on fractional factorial designs where we are talking about design generators and for this particular case the design generator happens to be i is equal to a, b, c, d or d is equal to a, b, c. So the columns corresponding to factor d is obtained by multiplying columns a, b and c. So this is what you get as the x matrix and then when you do x prime x you get 8i5 that means you get a diagonal matrix having 8 along the main diagonal and if I take 8 outside I will get 8 into an identity matrix of order 5 that is called as i5 and then you are also having x prime x inverse as 1 by 8 i5 and that is equal to 0.125000, 0.125, 0.125 is nothing but 1 by 8 and that is what is given here. If I take 0.125 outside I will get 1 by 8 which is equal to 0.125 and then multiplied by the identity matrix of order 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5 that is right. Okay next we will go on to the variance optimal first order designs. You can see that the variance is showing up so frequently we have variance in the experimental measurements. We are having analysis of variance ANOVA then we have variances in the regression in several forms standard error and then we also have variances in the regression coefficients. We have covariances between the regression coefficients and not only those we also have variances in the model predictions and now we are talking about variance optimal designs. So the design of experiments course is basically an attempt to understand the phenomenon of variance in experiments. So you have a full factorial 2 power 4 design and that would lead to the regression coefficients variances to be sigma squared by 16. We know that the variance covariance matrix is given by x inverse sigma squared and x prime x inverse happens to be diagonal matrix for orthogonal designs and the variances of the coefficients would then be sigma squared by 16. So here an important thing is the presence of 16 okay. So the diagonal terms is sigma squared by 16. Here we saw that the diagonal terms were having 1 by 8. So this was having 8 runs and then when you have a full factorial 2 power 4 factorial design you have 16 runs and the x prime x inverse sigma squared becomes the diagonal terms take the value sigma squared by 16. So it appears that for 2 power 4 design the variance is lower than variance of the coefficients are lower than the variance of the coefficients for a 2 power 3 design. A 2 power 3 design had only 8 runs. A 2 power 4 design has 16 runs and we know that the variance covariance matrix x prime x inverse becomes 1 by n along the diagonals into sigma squared and that sigma squared by n becomes the variance of the regression parameter. So in such a case you look for large set of experimental data thinking that it would reduce the variance of the estimated parameters. That is not correct way of looking at it because this is an artificial way of reducing the variance. So the best way is multiply the variance covariance matrix by n. So if you have n into x prime x inverse sigma squared that n will automatically cancel out the n coming inherently in the x prime x inverse matrix and then the variance coefficients would be independent of the size of the run. So then when you are scaling for the size of the run then both the 2 power 3 design and the 2 power 4 design or may be considered as optimal designs on a per observation basis. So now let us look for a repeated 2 power 3-1 design and this involves 8 runs. You are looking for a 2 power 3-1 design and that would be only 4 runs but we are talking about 8 runs. So is there a mistake? Was it a 2 power 4-1 design or was it a 2 power 3-1 design only? It is actually a 2 power 3-1 design and you had 8 runs because you repeated the experiments. Each setting was repeated twice and so you have a 2 power 3-1 design having 8 runs even though there are only 4 independent settings. So this repeated 2 power 3-1 design is orthogonal and the design points are located at extremes. They are located at plus 1 or minus 1. They are located at the boundaries and this variance optimal for the model involving main factors only. So y hat is equal to beta hat not plus beta hat 1 x 1 plus beta hat 2 x 2 plus beta hat 3 x 3. So this is a saturated design. You cannot go beyond this because you are having only 4 independent settings and that means you can maximum estimate only 4 parameters. Even though you have 8 runs, you are having only 4 independent settings and you can hence estimate only 4 independent parameters. And now the regression coefficients in the above model have minimum variance over all the designs with run size of n is equal to 8. So from the variance covariance matrix, we can easily find out that the variance of the beta hat i, the regression parameter is sigma squared by 8 into i4. Sigma squared is the unknown error variance, 8 is the size of the run and i4 is a diagonal matrix of order 4. So very interesting. Since we do not know the sigma squared, the experimental error variance, we use the mean square error. We find the residual sum of squares divided by the degrees of freedom for the residual sum of squares and then we get the mean square error. We use mean square error instead of sigma squared. So how did we get that for a 2 power 3 minus 1 design? You can see that each setting is repeated twice, 1 1 minus 1 minus 1, 1 1 minus 1 minus 1. So they are repeated twice and hence we have 8 runs. So x prime, x inverse in such a case would be 1 by 8, 1 by 8, 1 by 8 along the main diagonal and 0 along the off diagonals. So now we can compare the 2 designs and so what are we actually comparing? We can compare a 2 power 3 full factorial design involving 8 experiments, 8 independent settings but no repeats. Then we are also taking into consideration a 2 power 3 minus 1 design involving again 8 runs but only 4 independent settings and each independent setting has been repeated twice. So even though both the designs are having only 8 runs, they have some important differences. Both are variance optimal designs but what are the differences? The 2 power 3 minus 1 design does not have degrees of freedom for lack of fit. However why does not the 2 power 3 minus 1 design does not have degrees of freedom for lack of fit that is because you are having only 4 independent settings. You have estimated all the 4 parameters beta hat not, beta hat 1, beta hat 2, beta hat 3 and you are left with no degrees of freedom thereafter for identifying more parameters. However in a 2 power 3 design you have 8 independent settings and you have found only 4 parameters. Same once I listed just a bit earlier beta hat not, beta hat 1, beta hat 2, beta hat 3. So you have estimated 4 parameters but there are 4 more degrees of freedom for expanding the model. So you can use those degrees of freedom, additional or extra available degrees of freedom to estimate the 3 binary interactions and 1 ternary interaction. Even though the 2 power 3 minus 1 design does not have degrees of freedom for lack of fit and hence you cannot expand upon the basic model. You have center points repeated sorry not center points repeated, the factorial points are repeated and hence you can have a good idea about the experimental error. So on one hand you can expand upon the model and go for a more sophisticated model but since there were no repeats in such a design you cannot have an idea about experimental error. On the other hand in the second design which again involved 8 experiments but with the repeats you cannot expand upon the basic model but you can have idea about the experimental error. So which design you will go for depends upon what information you already have with you. If you know that there are no interactions in your model based on prior process experience then you can work with 2 power 3 minus 1 design with the repeats so that you have an idea about the experimental error. On the other hand if you suspect that interactions are there and you already have an idea about the experimental error based on previous knowledge then you can go for a full 2 power 3 design and try to estimate all the interactions higher order interactions in addition to the main effects. So what is the deoptimality value for first order designs and so the determinant of the moment matrix is equal to determinant of x prime x divided by n and that is equal to 1.