 Hello, welcome back. Today we will be looking at the prediction variances, the different versions of the prediction variances. We have seen that after we develop the regression model or the FX model for our chosen experimental design, we have to look at a few things. Of course R squared will give the quality of the fit but also we have to look at the prediction variances. The concept is quite straightforward. We want to ensure that the variances of the predictions are kept under control in the entire domain. We do not want certain regions in the domain where the prediction variances are very, very high. Then the quality of our predictions is not uniform in the domain and hence it is not reliable. You do not want too much of a variance in your predicted values. So let us look at the measures proposed by Montgomery at all in the estimation of prediction variances. So the variance y hat of x, here y hat refers to the prediction by the model, x refers to the point in the experimental space and that is given by xm prime x prime x inverse xm into sigma squared. Here xm prime is the coordinate x expanded into the model space. So this is as per the model. We have seen this in the previous class. X prime x inverse is the usual matrix based on the factors we have considered. This is a very crucial component in the analysis and sigma squared is the unknown error variance. So when we divide this expression by sigma squared, we make it independent of sigma squared and so we have the unscaled prediction variance as variance of y hat of x by sigma squared which is xm prime x prime x inverse xm. On the scaled prediction variance spv equals n times variance of y hat of x by sigma squared equals n xm prime x prime x inverse xm. Why do we do, why do we have to carry out all these things? Most importantly, please note that in these expressions we do not have y which is the vector of experimental responses anywhere. That is number 1. Number 2, by dividing by sigma squared these unscaled prediction variance as well as scaled prediction variance become independent of sigma squared which is not known. And another good thing here is we are multiplying by n in order to scale the prediction variance depending upon the size of the run. For example, if the run size is very high then the prediction variance will become low. So when we compare design involving less number of runs with another design with more number of runs even though the second design involving more number of runs is less efficient, it may show a smaller prediction variance. In such cases there will be some ambiguity. To avoid these kind of run specific or number of run specific designs we multiply by the total run size. So when we multiply by n these issues are taken into account. That is why we scale the prediction variance and then it is called as SPV. Sometimes in our discussions I will be referring to it as SPV that means scaled prediction variance. You also have the estimated prediction variance since we do not know the value of sigma squared. We use the estimated value of the sigma squared and this comes from the residual mean square. We have the total sum of squares and then we have the regression sum of squares the difference between the 2 will give you the residual sum of squares and that is divided by the degrees of freedom for the residual sum of squares. So we get the residual mean square and that is used as a surrogate for the unknown error variance. So when you plug that in we have the estimated prediction variance and that is why I have written here as MSE which is the mean square error. And when you take the square root of that we have the standard error of the estimated mean. We will simply call it as the standard error. So when we take the square root of the estimated prediction variance we have the standard error. So the prime was missing and so we now have square root of xm prime x prime x inverse xm into MSP. Now that we have the standard error we can very easily define the 100 into 1-alpha percent confidence interval as will be shown next. So we have the standard error as square root of xm prime x prime x inverse xm into MSE. So the 100 into 1-alpha percent confidence interval on the mean response is given by y hat of x0 plus t alpha by 2 the degrees of freedom for the residual error into square root of this expression xm prime x prime x inverse xm into MSE. So this is the response at any particular point we are interested in and then you have the t distribution which is based on a certain alpha level of significance. We are doing a two-tailed test and so you have alpha by 2 here the degrees of freedom corresponding to the residual error. And then we have the estimated prediction variance as given here and then we take the square root of that. And again to repeat this refers to the prediction made at a particular point x0. It can be any value x in the domain. So now let us look at a central composite design with 4 center points. You can see that this is the column vector comprising of 1's in the x matrix and then you have the columns corresponding to factor A and this is the column corresponding to factor B and then this is the column corresponding to factor AB the binary interaction. Since we are considering the second order model in the form of a central composite design we also have the quadratic terms x1's or a squared and b squared. So we are dealing with two factors. This is a central composite design for two factors A and B. So the column here is the column vector of 1's. This column refers to the factor A, this column refers to factor B, this column refers to the interaction between factors A and B and then this would be factor A squared and this would be factor B squared. These values here represent the axial points for factor A and axial points for factor B and then when you square the root 2 you get 2 here. And last but not the least we have 4 repeats here and the repeats are carried out at the center points and hence you have the 0 values here 0 0 0 0 0 0 0 and then when you take the binary interaction between A and B again it is 0. When you square A it is 0, when you square B it is also 0 at the center point under consideration. Now when you take x prime x inverse it is a sparse matrix still lot of zeros can be seen and you do not have only the diagonal elements. There are also some half diagonal elements and that is because of the central composite design structure. The experimental points are not located only at the extremes. They are also located at other locations or positions as well including the one at the center and hence it is not a variance optimal design or the design so constructed to reduce the variance. So first we calculate the unscaled prediction variance which is defined as xm prime inverse of x prime x and 2 xm. So we have the xm vector which is the coordinate point expanded to model space as shown here. So this would be the point we are interested in minus 1.167 minus 0.167. This corresponds to the column vector of 1's and this represents the product of the A and B. So here you have minus 1.167 into 0.167 and that comes to about 0.1949 and then this would be 1.167 squared and this would be 0.167 squared. And since you have xm you may also have xm prime and in the previous slide we had calculated x prime x inverse. So we have everything and the unscaled prediction variance is independent of sigma squared and so this is the value we get 0.3781 and you take the square root we get 0.615. So I have also shown how to calculate xm prime here this will be the transpose of xm and you can see that it is 1 corresponding to the vector of 1's factor A, factor B, factor AB, A squared and B squared. So we have calculated the unscaled prediction variance at a particular coordinate minus 1.167 minus 0.167. Now let us take another point xm nu prime and that would be at the locations minus 0.5 and plus 0.5 and again you have it is easier to compute now the interaction between A and B that would be minus 0.5 into plus 0.5 that would be minus 0.25 that is what you have here and then you have 0.5 squared and 0.5 square which is 0.25, 0.25 and unscaled prediction variance here comes to be 0.2266 and square root of that comes to be 0.4760 please note that we are having 4 center points here in this design. Now when you have only one center point what happens to the unscaled prediction variance we have seen that in order to stabilize the prediction variance we need more number of center points. Now we are having a central composite design with only one center point and let us see its impact on the prediction variance. If you look at the design you can see that as usual we have the factors A and B and then the axial points these are the factorial points these are the axial points corresponding to factor A and corresponding to factor B and then you also have AB here and then you have A squared and then you have B squared but the most important thing to note here is when compared to the previous design or the example we saw we are having only one repeat at the center point and we are having a new matrix X1. So what we can do is we can take the inverse X1 prime X1 inverse and this is what we get again we do not have or we have we do have non diagonal terms also and let us now look at the 2 XM vectors the same vectors which we considered in the previous case where we had 4 center points. So we are locating the XM corresponding to the coordinates – 1.167 and – 0.167 and – 0.5 and 0.5 so this is one coordinate and this is another coordinate and of course here again we are multiplying A into B and we get 0.1949 and – 0.25 and then we are taking square of – 1.167 we get 1.3619 square of – 0.5 is of course 0.25 and – 0.167 square gives you 0.0279 square of 0.5 gives you 0.25. So we have 2 different XM vectors let us see the scaled or unscaled prediction variance here and so this is 0.1 color coded red and this is 0.2 color coded blue. So when you have 4 center points the unscaled prediction variance at 0.1 was 0.3781 and it is 0.2266 at 0.2 okay let us just check that once again. So 0.3781 and 0.2266 corresponding to points 1 and 2 with 4 center points. So 0.3781, 0.2266 2 different points this is corresponding to 4 center points when you reduce the number of center points from 4 to 1 very surprisingly we see that the unscaled prediction variance at P1 are shot up from 0.38 to about 0.45 and at 0.2 it is shot up from 0.23 to about 0.65. So this shows that the prediction variance increases dramatically when you reduce the number of center points. This also tells you why we should have center points in experimental design not only the center points help you to find the estimate of the pure error it also minimizes or economizes the design strategy because we do not now have to repeat all the experiments at the corner points and the axial points we have to repeat the experiments only at the center points and it also as I said earlier gives you an idea about the pure error it also tells you whether curvature effects are significant or not and more the number of center points the scaled prediction variance comes down. So you can see that the center points play a crucial role in experimental design strategies and how important is rotatability. In second order design it is not important to have exact rotatability if the desired region of the design is spherical the CCD is most effective from a variance point of view. So what this slide recommends is more than rotatability it is the spherical nature of the design which is of importance. So for a spherical design you have the axial points at root k instead of fourth root of f where k is the number of factors and f is the number of factorial points and 3 to 5 center runs. So for k is equal to 3 that means 3 factor design use alphas equal to root k or root 3 which is 1.7321 instead of alphas equal to fourth root of 2 power 3 which is 8 which is 1.682. So it results in a nonrotatable design but it is preferred. So when you look at the central composite design you are running it at 5 levels 2 factorial runs 2 axial runs and then a center run. So let us now summarize the central composite design the central composite designs may be spherical alphas equal to root k with 5 levels or cuboidal with alphas equal to 1 with 3 levels. The spherical k is the design is rotatable or nearly rotatable and these designs are useful to capture the second order effects also termed as the curvature effects or quadratic effects. Now let us move on to the next design strategy after a small break. Now we are going to look at a new experimental design strategy it is called as the box benken design you might have come across this kind of design in research papers. Central composite design and the box benken design are most commonly encountered when second order models are getting discussed. So let us look at the features of the box benken design. It is a creative approach to planned experimentation involving relatively smaller number of runs. It is an important alternative to central composite design and box benken design involves balanced incomplete block design. What this means I will tell you shortly and let us look at an example of a balanced incomplete block design for 3 treatments. So in the box benken design you are having 3 blocks and in the first block you are considering only factors 1 and 2 and in the second block you are paying attention to factors 1 and 3 and in the third block you are paying attention to only 2 and 3. So when you are looking at this particular design you can see that each block has importance attached only to 2 out of the 3 factors which are being considered. And in block 1 factors 1 and 2 are paired together to form a 2 power 2 design. So when you have importance attached to only the first and second factor you construct a 2 power 2 design out of them. Similarly blocks 2 and 3 form individual 2 power 2 design involving pairs of factors 1 and 3 and 2 and 3 respectively 1 and 3 and 2 and 3 respectively. And the variables are kept at plus or minus 1 coded settings. And what is happening to the third factor when a design is formed between a pair of factors the third factor in the above design is at the center point or 0 setting. And the center runs are included in this design in the last row. So when you look at the design you have a b c the 3 factors you have minus 1 minus 1 1 minus 1 minus 1 1 1 1 which corresponds to a regular 2 power 2 design involving factors a and b and then factor c is kept at 0 setting. And since the values that may be taken by either a b or c are minus 1 and plus 1 0 would mean a center setting. And then when you look at the second block as shown in the brown color we have factors b is kept at the center level or the 0 level. And you can see that 2 power 2 design is formed between factors a and c. Next we have the third block involving a 2 power 2 design between factors b and c while a is kept at the 0 level. In addition to the above we also have center runs at the end where all the factors are kept at their mid values or 0 coded values. And then when you have 4 factors the concept is pretty much the same. Here you have factors a and b forming a 2 power 2 design. And the next you have a 2 power 2 design involving factors c and d. And then you have a 2 power 2 design involving factors a and d. Then b and c, then a and c and b and d. So you are considering 4 variables 2 at a time and that would be 6 combinations let us see 1, 2, 3, 4, 5 and 6. So we have the 6 combinations listed out in this table and then you also have the center point at the very end. When you go for 5 factors well we are increasing the number of factors and hence the number of experiments would also increase. And this is shown in a very compact form. So in the first design you are considering factors a and b or x1 and x2 and that is why you are having plus or minus 1, plus or minus 1 here. So when you have 5 factors taken 2 at a time that would be 10 combinations 5 into 4 by 2 that would be 10 combinations. It is very large to show all the 10 combinations and hence we show it in a condensed form. This represents we are considering factors 1 and 2, x1 and x2 and so we form a 2 power 2 design whereas x3, x4 and x5 the third, fourth and fifth factors are kept at 0 level. And in the next combination we take factors 1 and 3, third combination we take factors 1 and 4, next combination between 1 and 5 and then 2 and 3, 2 and 4 and 2 and 5, 3 and 4, 3 and 5 and 4 and 5. So we have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 combinations and the last row represents the centre runs. They may be more than 1 in number. This is a vector notation and so you may have more than 1 centre run at the end. And when you go for 6 factors rather than taking 2 factors at a time to create a 2 power 2 factorial design, we actually take 3 factors at a time and create a 2 power 3 design. So this is shown in the condensed form. We are now looking at the box Benken design for k is equal to 6 factors and instead of taking 2 factors at a time we take 3 factors and create a 2 power 3 design and when you do that we get a design where 3 factors are taken at a time. And here this represents x1, x2 and x4 forming a 2 power 3 design whereas x3, x5 and x6 are kept at the centre value. So 0 setting. Next we go for x2, x3 and x5 at a 2 power 3 design with other factors kept at their centre values. Then we go for x3, x4 and x6 which are kept at 2 power 3 design settings. Others are kept at centre values. And finally we have the centre runs. So when you compare the runs in a box Benken design and central composite design, you cannot construct a box Benken design with 2 factors whereas you can construct one with central composite design. And here when you have 3 factors you can see that CCD involves 14 plus NC and this is 12 plus NC and the number of runs are equal for k is equal to 4 factors 24 plus NC, 24 plus NC. For 5 factors 42 plus NC, 40 plus NC and when you have 6 factors the box Benken design comes to only 48 plus NC whereas this is going to as many as 76 plus NC. And for 7 factors you have 56 and 142 for the central composite and the box Benken designs respectively. So the important features of the box Benken design are they represent an interesting and practical alternative to the central composite design because for some combinations of factors you are going to have less number of runs. And it uses 3 levels and it is a rotatable design for k is equal to 4 and k is equal to 7. You may want to verify this. This is very straightforward after you construct the relevant matrix and BBD is a spherical design and all the design points are equidistant from the centre. So this is good from a variance prediction point of view. So this completes our discussion on design alternatives available to experimenters. I have discussed the most common designs. There are of course many more designs but I am sure that with this background you should be able to pick them up from standard textbooks and understand their implications. For example, you can have the phase centre design or the cubicle design. Other many designs exist but the box Benken design and the central composite designs are the most popular ones and they are frequently encountered in many research papers. Rather than just implementing the BBD or CCD directly it is important to understand the different implications of such designs. So you have to think about the number of centre points you may want to use and whether you want to have emphasis on rotatability or on the spherical nature where all the design points are equidistant from the centre except the centre points. So that is another important factor. You may also want to look at the scaled prediction variance properties. So an important thing here is the scaled prediction variance depends upon only the x prime x inverse matrix and then the coordinate location in the design space. It does not depend upon the experimental observation values y. So even before you start your experimental work you may easily estimate the distribution of the scaled prediction variance in your design space and you may choose a design suitably based upon the distribution of the scaled prediction variance. To emphasize this does not depend upon the experimental observations y. And another important thing to summarize here is the centre points. The centre points are used for getting pure error estimates. The centre points are helpful in enabling you to identify the curvature and they are also helpful to stabilize the prediction variance. The axial points are located in the central composite design for enabling you to identify quadratic terms or the squared terms. So each and every point in the design space has its own role to play. The factorial points of course are useful to find the effect of main factors and the interaction between the main factors. It is very important for you to decide upon the number of factors and the interactions you want to consider in your model. Look at the number of experimental data points then you decide upon the size of the model. If you go for a model with too many parameters a very ambitious or a greedy model then there would be the risk of aliasing and the X prime X matrix may have the danger of being undefined when the inverse is taken. So all these issues you must be aware of. So I request you to go through the portions covered so far and get a clear picture of the different experimental designs. Thank you for your attention.