 Let us continue with the discussion on deoptimal designs and other optimal designs looking at the deoptimality value for first order designs when we take the determinant of the moment matrix M given as determinant of X prime X by N power P where P is the total number of parameters we get that as the determinant of the identity matrix of order P which will be equal to 1. So for models which discuss the pure first order terms and the first order plus interaction cases the value of maximum of zeta determinant of M of zeta is equal to 1. Hence the variance optimal designs are also deoptimal in nature. Let us now look at another alphabetical optimal design the A optimal design this criterion addresses the issue of suitable estimation of the experimental designs model coefficients it deals with only the individual variances of the regression coefficients and we also note that to find the variance of the estimated parameters we refer to the variance covariance matrix and we look at the diagonal terms of this matrix and those are the variances of the regression coefficients. So the diagonal elements have to be multiplied by sigma squared the error variance to get the variance of the estimated parameters. Now how do we define the A optimality criterion it is defined as identify the minimum corresponding to the trace of M of zeta inverse. So what we are doing is we are looking at the trace elements of the M of zeta inverse matrix and we want to minimize the values of the diagonal terms okay and here trace represents the sum of variances of the coefficients weighted by n. So there are some experimental design based computer software that use A optimality and previously we have talked in length about the scaled prediction variance and it is a useful measure of performance the variances of the predictions should be kept under control and even if there are certain regions where they become unbounded then we have to relook at the relook the experimental design strategy. And the scaled prediction variance control helps us to make reliable predictions in the experimental design space there are many experimental designs and in the analysis unfortunately the scaled prediction variance is not given as much attention as it should be. So related to the scaled prediction variance is the G optimality criterion and the objective of this is to protect against the worst case variance prediction. So the model should not only predict well at the design points but also at all other points in the design space and we will assume that the factors are all quantitative and the G optimality criterion is given by identifying the scaled prediction variance in the experimental design space and finding the maximum value of that. Then our aim is to find the design zeta such that the maximum value of the scaled prediction variance is minimized okay. So the objective of the G optimality criterion is to minimize the maximum value of the scaled prediction variance so which is the design that is going to meet this criterion. And for the G optimality criterion we have to identify a region usually the region is cubical in nature or spherical in nature. So we can expect the G optimal criterion as n xm prime x prime x inverse xm and try to find within the region of interest R which is the maximum value and then identify the design zeta such that the maximum value of the scaled prediction variance is minimized a small typos there and just correct it. So now the inverse is expressed properly so what we are just doing here is substituting for the scaled prediction variance. So when the assumption that the errors are independent and have equal variances is a reasonable one we can see what is the minimum limit for the maximum value of the scaled prediction variance. So the region R of interest the maximum value of the scaled prediction variance of x will be greater than or equal to p. So the minimum value would be p where p is the number of parameters and the maximum value can be higher than the lower limit higher than this lower limit of p the number of parameters including the intercept beta hat not. Now we can define the G efficiency my target is to have the maximum value of the scaled prediction variance to be as low as possible and it should be p that is the lower limit but the actual design under consideration based on several factors may have a different value and that value would be higher than the value of p and the ratio of p to the maximum value of the scaled prediction variance for this particular design under consideration is termed as the G efficiency. So let us see what is the G optimality criterion for 2 power 2 design considering only the main factors and not even looking at the interactions. The number of parameters is 3 that is also equal to p and the design space is bounded by plus or minus 1 for all the variables x. The scaled prediction variance is given by n into variance of y hat x by sigma squared and we get n xm prime x inverse xm. We have divided by sigma squared because we do not know the value and that does not really depend upon the design it depends upon the actual experiment once it is performed it depends upon the error variance which is assumed to be constant at sigma squared. We do not know the value but for evaluating different designs we do not need it and so we divide by sigma squared and get rid of it. And then you can also have an artificially created small scaled prediction variance by having a design with large number of values. So n would be there and where n is the size of the run and by now you can show that n x prime x inverse for the design under consideration is I3 which is the identity matrix of order 3, 3 rows and 3 columns. So you may want to check what would be the n for the 2 power 2 design that I think you will figure out by now and you can also see what is the terms coming in the diagonal. Well the answer is very straight forward you are going to have 4 run experiment because it is a 2 power 2 design so the size of the run would be 4. So the scaled prediction variance is related to xm prime and that xm prime is the coordinate point expanded to model space so we have a scaled prediction variance comprising of xm prime that is written as 1 x1 x2 1 corresponds to the intercept because we are going to multiply this one with the intercept x1 would be corresponding to the setting of factor A, x2 would be corresponding to the setting for factor B and then we have I3 which is the diagonal matrix identity matrix with diagonal elements of 1 order is 3 and then you also have xm which is converting this matrix into its transpose so 1 x1 x2 along the rows will become 1 x1 x2 along the column. And so you have the scaled prediction variance of x as 1 plus x1 squared plus x2 squared and in the design space of interest our design is a 2 power 2 factorial design where all the factorial points are located at plus 1 or minus 1 so x1 squared will be 1, x2 squared will be again 1 so you will have 1 plus 1 plus 1 which is 3 and that is exactly equal to the number of parameters which are being estimated. So we have P is equal to 3. Now the G efficiency criterion for this particular design would be 3 divided by 3 that is the P which is the number of parameters in the maximum value of the scaled prediction variance in the region of interest for the previous design the maximum value of SPV is equal to 3 and so we have a G efficiency of 1. So the D optimal design is also G optimal for this case and in the same fashion it is easy to show that the first order design in K design variables for the cuboidal regions where the design points are located at plus or minus 1 only that means all the points are at their extremes in the design space at 2 level design of resolution greater than or equal to 3 results in a maximum x belonging to RSPV of x is equal to P okay. So a 2 level design means factorial design with 2 levels and the resolution greater than or equal to 3 you please refer to the lecture on fractional factorial designs where the design resolutions have been discussed and hence all these orthogonal designs 2 level orthogonal designs with resolutions greater than or equal to 3 would result in G optimal 1 for the first order model. Now let us look at another optimality criterion this is called as the V optimality criterion and the basis for this is to consider the prediction variance at a selected set of coordinates representing different points that are of interest to the experiment in the design region. So we identify some coordinates in the design space which are of certain interest this is coordinate 1 x 1 coordinate 2 x 2 so on to R such coordinates in the design space and how to choose those points there is no hard and fast rule here a set of test points from which the design was selected or it could be coordinates that have some specific importance to the experimenter and so any design that minimizes the average prediction variance over the set of m points is said to be a V optimal design. So actually there is a typo here it should be R I just make that correction then we look at the related criterion called as the I optimality criterion and the objective of this is to produce or provide a single measure of the model's prediction performance by means of an averaging process. So the averaging is carried out over a certain region of interest R and the moment we have averaging it also implies integration over a continuous domain and then after carrying out the required integration using the suitable function we divide it by the volume of the region R and we get the average this is a standard mathematical procedure for finding out average. So let us look at the integral over the region of interest that gives the volume of the region and that is equal to k the I integral criterion is identify the design such that it minimizes the average given by 1 by k integral over R scaled the prediction variance of x dx and that is denoted by the design zeta such that it takes a minimum value of I of zeta where I refers to the integral the integral is given here 1 by k integration over R scaled the prediction variance of x dx. So we are substituting the definition for SPV into the integral and we get this. So earlier we had n into x prime x inverse but from the definition for m which is equal to x prime x by n. So x prime x inverse into n will be m of zeta inverse. So this derivation is pretty straightforward and that is why instead of having x prime x inverse into n in addition to the other 2 terms we have m of zeta inverse and that is the I of zeta the integral value for a particular design under consideration and we have to identify the design such that the integral is minimized. So even though these mathematical formulae look a bit formidable the meanings are pretty straightforward. So you can build even more sophistication into this approach by assigning weights to certain points in the domain or coordinates in the domain which perhaps are more valuable than other points. So it is very design specific and after this definition it is up to the individual experimental program where this may be applied. So the I optimality criterion is a reasonable method for deciding upon the suitable experimental design. Designs based on this criteria of a good average SPV are expected to yield a satisfactory results throughout the design space. However the D optimality criterion is more popular and the G optimality as well as the I optimality criterion are not as popular as the D optimality one. So we also have the I efficiency criterion we define a particular optimality criterion and then we also try to find the efficiency. So we have seen that the I optimality criterion was given according to this integral and we can also find the minimum value of such an integral. What design will give you the minimum value and then the actual value I of zeta star. For a particular design I have chosen I will get a certain value of I of zeta star after the averaging is done. Then I compare it with the minimum value of I of zeta. I compare it with the design which is going to give me for the same number of variables a minimum value of I of zeta. The ratio of the 2 is termed as the I efficiency. So we have used simple first order models to illustrate different optimality criteria just to see how to carry out the different matrix operations the inverse of the matrix operations and then how the integration has been done and so on. So the design optimality criterion is very interesting and a very useful concept. It gives you additional insight or information to your experimental design. There is more to every experimental design than simply finding the regression model or the equation describing the designs performance over the design space. So the objective is much deeper than that after all we are developing only a simple regression model straight forward regression one getting some statistical parameters and finding which of them are significant which of the coefficients are significant. So over and above all these things we have to look at how good the model is in the design space and we have to a priori feel or select a model which will give us the desired features. So lot of planning is there in the choice of the experimental design. So upfront we have to ask ourselves what do we want out of this model and based on that we can develop a suitable optimality criterion and see which model or design will fit into that. We will give a desirable value of the optimality criterion and there is no one single optimality criterion there are many optimality criterion and we can choose one which is closest to our expectation. And the design of experiments and response surface methodology are becoming very popular due to many software availability and also it is not that difficult to apply the different optimality criteria. So we have lot of flexibility in choosing a particular experimental model. We know we are not constrained to select one particular model because that is being commonly or popularly used in the literature. So we need good designs that perform satisfactorily over a wide variety of possible circumstances and designs which are optimized over a narrow value of idealized conditions and assumptions are of little value. So the first order models including the interactions between the various factors the standard designs are optimal. However the standard response surface methodology designs like the box benken design or the central composite design are rarely optimal for the second order case even though they have several desirable features. And for the purpose of response surface methodology analysis we more frequently resort to the central composite design or the box benken design strategy and they should be used or considered whenever possible. And there may be many situations where the CCD or the BBD may not be implemented these could be due to constraints or unusual design sizes. Constraints can be you may not be able to perform certain experiments along the axial nodes which is recommended by the central composite design. So when we are not able to do the standard designs we have to go for a computer generated designs that are based on certain optimality criteria we have discussed previously. Okay so there is a table which is given by Myers et al. 2009 which compares typical standard second order designs for a spherical region. I am just giving an illustrative summary for a more complete summary you may refer to the reference book. So we can compare standard designs I have only chosen the central composite and the box benken design. You can see an additional parameter which is the number of center runs NC. K refers to the number of factors in the design 2 factors, 3 factors, 4 factors. Then we have the different designs and you can see that the D efficiency as well as the G efficiency have been listed. For number of center points equal to 1 the D efficiency is 98.62 but the G efficiency is 66.67%. When you increase the number of center points the D efficiency slightly reduces from 98.6 to 96.9 whereas the G efficiency dramatically improves to 87.27 from 66.67. And then for K is equal to 3 you have a experimental size of 15 for a central composite design with only one center point. The D efficiency is quite high but the G efficiency is pretty low. When you increase the number of center points again the D efficiency slightly falls off but the G efficiency improves dramatically. For a box benken design even though you have only one center point the G efficiency is quite decent. And when you actually increase the number of center points both the D efficiency and the G efficiency decrease for the box benken design. For a box benken design with NC is equal to 1 the D efficiency is quite okay at 97 but when you increase the number of center points to 3 it drops slightly to 93.82. Whereas the G efficiency which was quite low to start with one center point has now reduced even further to 66.67. And then with 4 factors the central composite design and box benken design give pretty much the same D efficiency and G efficiency. So it appears that increasing the number of center points in the CCD or BBD is not really good because the optimality criteria may decline. So when we went from 1 center point to 3 center points the box benken designs D efficiency and G efficiency actually declined. And here also for the central composite design increasing the number of center points from 1 to 3 actually did reduce the D efficiency percentage but on the other hand the G efficiency percentage increased. So the moral of this table is both the optimality criteria need not go in the same direction if one increases the other may decrease. But we cannot rule out the center points because the larger the number of center points the more would be the degrees of freedom for pure error. It also helps to estimate the pure error and when there are more center points you can estimate the pure error component more reliably and it also helps to stabilize the scaled prediction variance. So when the CCD had only one center run the G efficiency was quite low. So when only one center run was there the G efficiency was quite low at 66.67. For the spherical design when the center run is absent the X prime X matrix may be singular or near singular and that was not acceptable. So the use of only a single center point leads to a large SPV of X in the design center as the number of center points increase these prediction variance becomes more stable and the G and D criteria react adversely to the presence of many center points due to the nature of their definitions. You can see in the box Benken design even the D efficiency reduced from 97 to 93.82 when you increase the number of center points from 1 to 3. So that is because of their nature of their derivations and we have seen that the standard RSM designs are reasonably close to the best optimal design in the D optimal or the G optimal sense. For example when you look at 4 factors central composite or box Benken design the D efficiency is close to 100% and the G efficiency is also close to 100%. So the CCD and the BBD designs would be used when there are more number of factors that we considered and they are pretty efficient and it does not mean that you should always get all the criteria to be close to 100% whenever you are going with less number of factors okay. Even for smaller number of factors the D efficiency values are pretty high it is only the G efficiency which are quite low for cases involving less number of center points. So there is no one hard and fast rule or magic rule to get the best design and another thing to notice the 2 criteria which are very commonly used do not show us the stability of the scaled prediction of variance in the design space. So just relying on a single value of the D optimality criterion is not recommended and the design is multidimensional in character and we cannot go with only one criterion. We should actually look at the scaled prediction variance in the design region. So in addition to the different optimality criteria we should also pay attention to the prediction capability of the model in the design space. So how to represent the scaled prediction variance for a 2 dimensional problem involving only 2 variables 2 factors A and B or X1 and X2 the scaled prediction variance is quite easy to plot but when you have more number of parameters it is not easy and also instead of sigma squared we use the S squared which is the mean square error. So we use variance of Y hat of X and estimated value of that that is why it is represented as variance of Y hat of X hat and instead of using sigma squared here we use S squared. So now we are talking about the scaled prediction variance estimation or the estimated scaled prediction variance. So in addition to the optimality criterion we also have to pay a closer look at the scaled prediction variance. So from graphical methods we aim to get a bird's eye view of the scaled prediction variance of the entire design space irrespective of the number of factors and the number of center points. And so if we are able to represent even for multidimensional cases the distribution of the scaled prediction variance in the design space we can have overall view or a bird's eye view on how good the design is. How good the design is performing at different locations in the experimental space. So we can see which regions in the design predict poorly or nicely and it also helps us to see how the non-designed points or future points in the design space get predicted. So when you look at CCD and BBD from the scaled prediction variance point of view the box benken design has superior minimum prediction of SPV of X at the cost of inferior maximum prediction at the range of the design space. So what this means is BBD ensures that the scaled prediction variance is quite small at the interior of the design space. But the SPV actually blows up when you go to the extremes or the boundaries of your design space. So at the design edge the scaled prediction variance of X is pretty high but only a small proportion of the design region have these high values. And the G efficiency which considers only the maximum SPV of X may be quite deceptive as it does not consider what is happening in the interior of the design space. So a single number efficiency hence cannot be truly reflective of what is happening in the entire design space. Mostly the maximum scaled prediction variance of X in a second order design occurs at the design boundary or at the design perimeter. So the G efficiency criterion reflects on what is happening at the edge of the design only. So I have just given you a illustration or a brief introduction to the scaled prediction variance distribution in the design space. It is important for us to represent it graphically and there are certain graphs called as the variance dispersion graph and the fraction of design space plots. They are somewhat difficult to plot and they cannot be done manually in most cases and we have to resort to statistical software which can plot these fractional design space plots and variance dispersion graphs. So I have given only a brief overview on some of the advanced features of experimental design but this introduction should form the suitable basis for further reading. So we have come to the end of the introduction to advanced or optimal design concepts. So this may be the starting point for a further advanced course on statistical design of experiments. So I have given a complete overview of the basic statistical principles that are involved in design of experiments, looked at some popular design of experiment techniques and finally given an introduction to the more advanced experimental design strategies. So we have now a good background and knowledge of conventional design strategies and also the capability to read or no further on advanced experimental design concepts. So I will conclude at this point and in the final lecture next I will summarize whatever we have covered in this course. Thank you for your attention.