 Okay, after the break discussing about prediction variance, we will be dabbling a bit with linear algebra and the vectorial representation of a point in the design space and the distance of a particular point from the origin and so on. It may look a bit difficult for some, but concepts are very straightforward and brief recap on linear algebra would be very helpful at this stage. So I request you to take up any book on linear algebra and just look at the concepts of vectors, the distance of a point in the three-dimensional coordinate system and multiplication of vectors and the inversion of matrices. So once we develop a regression model, we are going to use it for predicting purposes and how do we predict it? We multiply the vector of the estimated parameters with the x matrix. So again the x matrix is very important here and that would give us the column vector of predictions. This is nothing new to us especially after our review of regression concepts. Now let us say that we want to predict the performance of the experiment at a point Q in the experimental design space and this point Q in the r-dimensional design space is given in terms of some coordinate values. So if you are having an r-dimensional design space, it would be x1, x2, x3 so on to xr. So if you want to predict the experimental response at this particular point Q then what you have to do is substitute the coordinates of Q in the model equation and that would give you the prediction y hat Q at this point. So there is a typo here, I will just make the correction to the typo. So the predicted value at this point is obtained by substituting this points coordinates in the model equation to obtain the prediction y hat Q at this particular point. And what we are going to do next is to create a vector. This vector is different from the x values corresponding to the coordinates of Q because the vector we are going to construct accounts for the model in consideration. Whereas the previous set of x values corresponded to the location of Q in the experimental design space. So please do not confuse between the coordinates of Q and the xm prime vector which we are going to construct shortly. So the vector which we are going to construct now refers to the linear regression variable forms to the extent considered by the model. So we have estimated the parameters based upon a certain model considered by us. And then we look at the model form and the model would have the main factors up to r factors, the binary interactions, ternary interactions and so on. Obviously you may not have considered all the possible combinations of the regressor variables. You would have limited the model to a certain extent depending upon your requirement. So that is the model you are going to work with and that is the model which is going to give you the xm prime vector. So let us say that you had considered only the r main factors in your model. Then the xm prime vector would be 1 x1 x2 so on to xr that is it. We do not have the binary interaction terms. Suppose your model had considered the main factors and also the binary interactions then you would have to go from 1 x1 x2 so on to xr and then you go and exhaust all the binary interactions and that is where you stop the xm prime vector. And so you have the coordinates of the point q, the y hat q which is a scalar is obtained at the point q by substituting the xm prime in this equation given here. y hat q is equal to xm prime beta hat. So we have the matrix form of the regression equations where you have y which is the vector of responses and then you have the matrix x and you can see that the model form is getting reflected in each of the rows. So obviously you are going to have n rows where n refers to the number of experimental settings and then if you go horizontally along each row then you are dealing with the model. So these are the regressor variables and they are given in the matrix notation as row number and column number. So this can be x1, this may be x2, this may be x3 and then the last one may be the last binary interaction term may be x1, x3 or x2, x3. And this is the vector of the parameters which we want to estimate and this is the error term. I am giving you the slide again from our regression lecture for you to remember how the x matrix looks like and most importantly if we look at the xm prime vector that refers to the form carried by the row of the x matrix here. So if you are considering only terms according to this model then your xm prime will also be dictated by this entries in the row of the x matrix. So y is a n by 1 vector of the experimental responses and x is a n by p matrix of the levels of the independent variables. You have n rows and p columns, p is equal to k plus 1 where k is the regression coefficients and the 1 refers to the intercept beta hat not and the beta is a p by 1 vector of the regression coefficients and epsilon is a n by 1 vector of the random errors and you have p is equal to k plus 1. So the prediction variance at this point q is given by variance of y hat q is equal to xm prime x inverse xm sigma squared. This is a very, very important equation. We want to see how good the prediction is at the point q. So if the point is very far out into the design space what is the measure of its prediction? Is it a good prediction or is it a bad prediction? How does the prediction capability of the model change when you go further and further away from the center of the design space? So you have variance of y hat q is equal to xm prime. We saw just now how xm prime was constructed and then you have x prime x inverse xm into sigma squared. The sigma squared is the variance of the error and we assume that the errors are normally and independently distributed with 0 mean and variance sigma squared. Unfortunately we do not know the value of sigma squared and so we use the residual mean square to get a estimate of sigma squared which we call as sigma hat squared. Again we have seen these things in the regression lecture. So variance of the prediction y hat at the point q is equal to xm prime x prime x inverse xm into sigma squared. So how do you find the estimated standard error of y hat q or y hat at x is given by square root of this particular equation and we replace sigma squared with the mean square error. How did we find the mean square error? It is nothing but the sum of square of the residuals divided by the degrees of freedom of the residuals that is n-p. So we have sy predicted x is equal to s into square root of xm prime x prime x inverse xm where s is equal to square root of the mean square error and the residual mean square is obtained from the total sum of squares minus the regression sum of squares. So the prediction variance is a very important concept in experimental design. Let us see why. The book written by Montgomery et al. 2009 refers xm as the vector x made out of the point q coordinates expanded to model space. So xm is not equal to x. It is not the collection of coordinates corresponding to location of q but it is comprised of elements which correspond to the model equation in consideration. We have already seen this a couple of slides back. Now the prediction variance is a function of the spatial coordinates where the prediction is being made and it is also a function of the model. Let us look at that. So the xm prime depends upon the coordinates because the coordinates of q, x1, x2, so on to xr determine the value of xm prime. So what is going to happen to the variance of the prediction when you move very far out into the experimental space. When you move very far out into the experimental space the x1, x2, so on to xr coordinate values of the point q will increase. So we can intuitively expect that the variance is likely to increase when we go further and further away into the model space into the extremes of the model. So that is not the only aspect. The location of the point q is not the only aspect. It also depends upon the x prime, x inverse matrix. This x prime, x inverse matrix is strongly determined by the design we have chosen. So that is also to be remembered. The experimental design we have taken into consideration also influences the variance of the predicted value at a point q out in the design space. So there are 2 factors. How far is the distance q out in the model space and what is the nature of the experimental design which dictates the x prime, x inverse matrix. So let us take a model which is involving 3 variables and the binary interactions only. Then the model vector xm prime becomes xm prime is equal to 1, x1, x2, x3, x1, x2, x1, x3, x2, x3. So we have 4 plus 2, 6 plus 1, 7 terms in this particular xm prime. So the prediction variance varies from point to point in the design space. It is also a function of x prime, x inverse and hence the experimental design and it is a measure of how well one predicts with the model and this is often used as a criterion for comparing different design strategies. So when you are choosing a particular design strategy, the expected question from your management or your supervisor would be why did you choose this particular design? Why not some other design? So you should be able to use this prediction variance as one of the different criteria for justifying the choice of your design. So let us now define the scaled prediction variance at location x as n variance of y hat x by sigma squared. So when we want to compare different designs, we really do not know sigma squared and we have not even conducted the experiment for us to have the value of the mean square error. So what is the point in having sigma squared? So we divide the prediction variance with sigma squared so that it becomes independent of the error variance and then we also multiply it by n because when you have a large number of experiments conducted then the unscaled or the prediction variance would decrease owing to the high value of this n which is the total number of runs performed. So to prevent artificial reduction in the prediction variance, we multiply this variance with n. So we divide by sigma squared and multiply by n. We divide by sigma squared so that the prediction variance becomes independent of the error variance and we also multiply by the total number of experiments performed n in order to make the prediction variance independent of the size of the run. So we get the scaled prediction variance of x as n xm prime x prime x inverse xm. So division by sigma squared makes the scaled prediction variance independent of error variance while multiplication by n scales the scaled prediction variance according to the size of the run. For a first order orthogonal design, n x prime x inverse is equal to ip where i is the identity matrix and the size of the identity matrix is equal to p where p is equal to k plus 1 the total number of regression parameters. So you will have an identity matrix of p rows and with p rows and p columns. Let us take an example. The first order model for k is equal to 2 that means 2 regression parameters p will be equal to 3 is represented in terms of the model vector xm prime as 1 x1 x2. So we are going to find a beta hat not beta hat 1 beta hat 2. So we are having model vector having 3 terms. So when we want to look at the scaled prediction variance, I will correct a small typo at this point okay. When we are looking at the scaled prediction variance, we have xm prime x prime x inverse into xm into n. So ultimately this is the form we get because the x prime x inverse multiplied by n will become an identity matrix and so we can directly have this product of the vectors as shown here and that would be 1 plus x1 squared plus x2 squared plus so on to xk squared and that would be 1 plus rho x squared because x1 squared plus x2 squared plus so on to xk squared is nothing but the square of the distance of the particular point from the origin. So this is applicable for a first order model of order k and rho x squared is the square of the distance of point q from the design center. Hence the scaled prediction variance is unity when x1 is equal to x2 is equal to so on to xk is equal to 0 and it increases as the point moves away from the center. So let us look at the design. Here we are having a 2 power 3 design. So you have 8 rows and you have a full 2 power 3 design and the x prime x matrix becomes a diagonal matrix with the diagonal elements having the value of 8 and when we take the inverse of this we will get 1 by 8, 1 by 8, 1 by 8, 1 by 8 or 0.125 throughout the diagonal and when we do n into x prime x inverse we are multiplying everything by 8 and we get the identity matrix of dimension P, P is equal to k plus 1 and that would be 4. So we are having a 4 by 4 identity matrix. So if this is x, x prime the transpose of the matrix is obtained by changing rows into columns and columns into rows. So when we try to calculate the scaled prediction of variance for the 2 power 3 design so we again get 1x1, x2, x3 multiplying 1x1, x2, x3 and that would be 1 plus x1 square plus x2 square plus x3 square plus x3 square. I will correct this typo here. So again we get 1 plus square of the distance from the design center. So for this 2 power 3 design the scaled prediction of variance is unity at the design center and then increases as we move away from it. This is for an orthogonal design. So we can see that the scaled prediction of variance at different locations in the design space and it is unity at the center and then as we go to the extremes of the design space we get the scaled prediction variance as 4 and if you have a 2 power 5 design the boundaries are given by plus or minus 1, plus or minus 1, plus or minus 1, plus or minus 1, plus or minus 1 that is 5 times. So the scaled prediction variance we can easily show to be equal to 6. So when compared to the design center where the scaled prediction variance is 1 when you go to the design boundaries or design extremes the scaled prediction variance increases to 6. So it has increased 6 times. So when we have a nonoptimal design let us see what happens to the scaled prediction variance. So in this nonoptimal design the x matrix is given as shown here. You also have repeats and that is why you have these 2 rows with 0s in them and when we look at the x prime x inverse matrix we get 1 by 6, 1 by 4, 1 by 4, 1 by 4. So this is the matrix we get in such a situation for a nonoptimal design. Why is it a nonoptimal design? The design points are not only located at the extremes but we have the design points located at the center also. In such a case at the boundaries of this design the scaled prediction variance turns out to be 5.5. Now we are going to look at the scaled prediction variance for a nonoptimal design. It is a nonoptimal design because the design points are not located at the extremes of the design space alone. In addition we are having design points located at the design center. In such a situation the scaled prediction variance is higher and we can easily calculate it. The size of the run is as you can see it is 6 and when you have a size of the run to be 6 n is equal to 6 and so we multiply it by 6 and then you have 1x1, x2, x3. We are considering a model involving the main factors alone and the x prime, x inverse matrix turns out to be a 1 by 6 and then you have 1 by 4, 1 by 4 and 1 by 4 as the remaining terms in the main diagonal and so when we do the multiplication we get 6 by 6 will become 1, 6 by 4 will become 1.5 and hence you will get this particular expression. When you simplify this we will get scaled prediction variance of x is equal to 1 plus 1.5 into x1 square plus x2 square plus x3 square and at the boundaries if you put x1 is equal to x2 is equal to x3 as 1, 1 and 1 you will get 1 plus 1.5 into 3 as 5.5 totally. So the scaled prediction variance at the boundaries of the design is 5.5. You may want to see what would be the scaled prediction variance for the optimal design if the center points had not been there. This concludes the discussion rather brief one that on the orthogonal concepts usually this topic is not covered in factorial design of experiments. It is supposed to be implicitly understood but I thought having a separate lecture on this concept would put things into perspective. It will also explain why in the regression analysis the adjusted sum of squares and the sequential sum of squares are identical. For an orthogonal design it does not really matter in what sequence a particular factor enters the model whether it is coming in the beginning or in the end or coming as a sequence. But when you have a non orthogonal design the adjusted sum of squares and the sequential sum of squares are different and the order in which the parameter or the factor enters the design experiment assumes importance. The order in which the regression parameter is introduced into the model assumes importance in non orthogonal designs. In orthogonal designs it does not really matter and therein lies the advantage. So it is always better to go for planned experiments such that your design space is comprising of orthogonal vectors and you can also code them uniformly. So you will have column vectors of –1, –1, 0 and so on. The design looks neat. In some cases it may not be possible you may have to work with available data to develop your regression and then in such cases you can adopt the general approach. Very important advantage of factorial design of experiments is it leads to an orthogonal design and the parameters are estimated quite easily. The x prime x matrix is a diagonal matrix in such cases and the estimation of the parameters is becoming very straightforward. And the role played by a particular factor in orthogonal design experiments may be assessed independent of the other factors. So this concludes our discussion on orthogonal concepts involved in experimental design or simply put orthogonal designs. We will next move on to second order design such as the central composite design which will lay the ground work for response surface methodology. Thank you for your attention.