 Okay, continuing with our discussion on blocking, this is a rarely used concept but has lot of practical value. So it is not possible for you to carry out repeats on a single specimen but you do it on different ones. For example if you are trying to look at the effect of different fertilizers in the plot of land. So you put one plot of land and then you put different fertilizers in that plot and monitor the growth of the crops. Obviously if you are talking of repeats then you have to wait until the first crop is ready before you put the fertilizers the second time. But if you want to repeats in parallel then you have to put different fertilizers in a second field and the second field is different from the first field or it may be different from the first field. So the first field and second field are called as blocks. So the experiments are conducted on different blocks and we have to account for the blocking influence also. And if you account for the blocking influence you lose on the degrees of freedom and what happens is that is a negative point but on the other hand the sensitivity of your tests will increase when you do blocking. So in blocking we try to answer the question how to account for the difference between the specimens on which the treatments were carried out. So what we are doing is we are carrying out different tests on a particular specimen. Now that specimen is used up then we do the same tests on the second specimen. But the first specimen is different from the second specimen or it may be different it may be identical but usually it will be different from the first specimen. So the first specimen is a block, the second specimen is also a block. So here we are accounting for the variability due to the two different specimens. So we are blocking out the effect of the type of specimen on which the tests were carried out. So there is some loss of information because of blocking. Actually the blocking helps to increase the sensitivity of the experiment to differences between the levels of study variables okay. So please look at the ANOVA table given for blocking and see the degrees of freedom eaten up by the blocking and on the other hand how the test became more sensitive due to the blocking effect. Next we move on to factorial design. Here we are talking about not a single factor but we are going to talk about more than one factor. So it can be multiple factors and those factors may be set at two levels. In factorial designs of level 2 we can go for any number of factors 3, 4, 5 but each factor will be set at only two levels, one lower level and the other a higher level. So we call it as minus 1 setting and plus 1 setting. What are the advantages of factorial design? It helps to analyze and interpret your results in a scientific manner. You can carry out the response surface methodology and qualitative and quantitative factors may be analyzed together. For example you can have temperature, pressure and the type of catalyst carried out in your analysis. Temperature pressure will be having continuous range of values whereas the type of catalyst may be catalyst A, catalyst B and so on. So it is a discontinuous variable or a qualitative variable but using design of experiments and factorial design you can account for all the 3 factors simultaneously. And this factorial design is compulsory for industrial competitiveness and the advantages of factorial design are many fold. The design is orthogonal as the different effects and their interactions contribute to the sum of squares independently. So when you have an orthogonal design each factor contributes to the response in its own way. So the variability in the response is contributed independently by the different factors. And factorial design of experiments enable us to extract required information from the experiments even the phase of distractions created from unpreventable random variations. So there are going to be random variations throughout the course of our experiments. Despite the distractions from that the design of experiments especially the factorial design will help us to find the or identify the main effects and their interactions. One more important thing especially in this tree is design of experiments help us to extract rich informative content from data using limited number of experiments. Another important concept in factorial design is the interaction between factors. Factors really meant by interaction between factors. What will happen is the role of factor A will depend upon the level of factor B. At one level of factor B, A may behave in one manner and at another level of factor B, A may behave in another manner in which case the two factors are said to interact. The two factors that are not independent of one another are said to interact when factors interact the change in response to the change in one factor depends on the level of the other factors. So if the change in level of the first factor causes a certain change in output response at one level of the second factor and identical change in the first factor level at the second level of the second factor will produce a markedly different output response. So for this you please look at the example I had given on cricket scores from a batsman depending on whether he had taken tea or beer before coming out to play. Now if you look at the typical analysis of variance table in factorial design of experiments you have a source of variability due to A treatments and B treatments and then you have the interaction between A and B. Then you have the contribution from the error and so you have sum of squares of factor A, sum of squares of factor B, sum of squares of factor AB, sum of squares of error and again you have the degrees of freedom A-1 for A and B-1 for B and interaction has A-1 into B-1 degrees of freedom and error has AB into N-1. We calculate the mean squares as usual by dividing the sum of squares by the degrees of freedom, the respective degrees of freedom and so this is what you have. So to find the f0 for A we find the mean square A by mean square error for f0 for B we find the mean square B by mean square error for finding out the f0 for interaction between A and B we take mean square interaction and divide it with the mean square error. So when you have these 3 f values you compare it with the f alpha numerator and denominator degrees of freedom. So the numerator degree of freedom would be corresponding to the different factors and denominator degrees of freedom would be corresponding to the error degrees of freedom. If the computed value of f0 for the different factors and the interaction are higher than f alpha numerator, denominator degrees of freedom then those f values are lying in the rejection region. So the f alpha numerator and denominator degrees of freedom would define the critical f value and if that critical f value is exceeded by one or more of these 3 statistics then those particular factors are set to lie in the rejection region and we can reject those appropriate hypothesis. The hypothesis here would have been the treatment A is having no effect at all, mu A is equal to mu or tau A the effect of factor A is equal to 0. So tau A is equal to 0. Similarly for factor B we say that mu B is equal to mu the overall average or mu B is equal to mu plus tau B the null hypothesis says that tau B is equal to 0. Similarly for the interaction if your f statistic lies in the rejection region you reject the null hypothesis and say that factor A is important or factor B is important or factor AB the interaction between AB is important depending upon whether the f value is lying in the rejection region or not. Then we move on to multiple regression where we have the experimental response in terms of beta 0 plus beta 1 x1 plus beta 2 x2 plus random experimental error. So the response y is not only strictly determined by factors x1 and x2 but also by random error component. So beta 1 and beta 2 are called as partial regression coefficient 1 and partial regression coefficient 2 respectively. So this is a very interesting diagram here we plot the response versus x and we find the response is scattered in the 2 dimensional plane and we try to fit a line which passes the best possible manner through the points. We try to balance the line through these points there can be more than 3 such points. What is of importance is the regression line represents the true value and the experiments are showing deviations from the true value because of random fluctuations. So that distribution of the fluctuations from the true regression value is described by a normal distribution centered around the true line value and the variance of this distribution is sigma squared. The sigma squared is also called as the error variance because of the random fluctuations or random errors only the data points are deviating from this straight line. So the mean of these normal distributions correspond to the regression line value y is equal to beta 0 plus beta 1 x but there are deviations from these because of random error contributions and when we do the experiments next time we may get the data point lying somewhere here or it may be lying somewhere here because it is a random phenomena. We can also do multiple regression model especially with linear algebra in a very swift manner. So we describe a general multiple regression model as y is equal to beta 0 plus beta 1 x 1 plus beta 2 x 2 plus so on to beta k x k plus epsilon and this has k regressor variables. This is a multiple regression model with k regressor variables. So they are also called as factors or regressor variables x 1, x 2 so on to x k. The parameters beta 0, beta 1, beta 2 so on to beta k are called as partial regression coefficients. Now we can have a matrix approach to multiple linear regression. So if there are k regressor variables x 1, x 2 so on to x k and with n observations here the index i represents the run number. You can have n runs performed. So xi 1, xi 2, xi 3 and xi k are the x values corresponding to the ith run for factors 1, 2, 3 so on to k and the model is given by beta 0 plus beta 1 xi 1 plus beta 2 xi 2 plus so on to beta k xi k plus epsilon i or epsilon i is the random error component and usually the number of experimental settings should be greater than the number of regression parameters. So this may be represented in matrix notation as y is equal to x beta plus epsilon. So you have the column vector here y is equal to y 1, y 2 so on to yn, x is equal to 1, x 1, x 1, 2, x 1, 3 so on to x 1 k. This perhaps may be main factor A, main factor B, x 1, 3 may have been interaction between the 2 factors and so on. You can even have quadratic terms like x 1, 1 squared or x 2, 2 squared and so on. And so you have for n experimental settings and then you have the beta column vector comprising of beta 0, beta 1 so on to beta k. Epsilon is the column vector corresponding to the random error component epsilon 1, epsilon 2 so on to epsilon n. So when we define the matrix approach to multiple regression y is equal to x beta plus epsilon using these coefficients in the following model will help us to predict the response of the various values of x i. So we are knocking off the error component because this is a prediction. We say that y hat is equal to beta 0 hat plus beta hat 1, x 1 plus beta hat 2, x 2 plus so on to beta hat k, x k. Note that we are having k regression parameters beta hat 1, beta hat 2 so on to beta hat k. Beta hat 0 is the intercept in the multi-dimensional space. So when we want to find the least squares estimators for beta we can solve this equation for beta hat x prime x inverse x prime y gives you beta hat. So next we go on to the variance covariance matrix which is a very important one and we want to look at the variances of the estimated parameters. If the variances are small then those parameters are been estimated quite precisely. So you have the x prime x inverse matrix multiplied by sigma squared and this is what you have here. The sum of squares of the error is given by y prime y minus beta hat prime x prime y. There is a typo let me just correct it. So we have sum of squares of error as y prime y minus beta hat prime x prime y that may be represented as y prime y minus sigma equals 1 to n yi whole squared divided by n minus beta hat prime x prime y minus sigma is equal to 1 to n yi whole squared by n. The sum of squares of error is written as sum of squares of total minus sum of squares of regression. So this term here represents the regression sum of squares. Here we are knocking off sigma i equals 1 to n yi squared by n to correspond to the sum of squares given by the intercept beta hat not. So we again have the analysis of variance table source of variation due to regression and errors or residual sum of squares of error. So you have k and n minus p degrees of freedom and total sum of squares is having a degree of freedom of n minus 1. And again we find the mean square regression to mean square error. So we get sum of squares of regression by k which is mean square regression. Sum of squares of error is given by is divided by n minus p degrees of freedom to give mean square error. The ratio of mean square regression to mean square error will give you the appropriate f0 value which you can test to see whether this f0 value is lying in the rejection region or in the acceptance region. Then we also talk about r squared which is sum of squares of regression by total sum of squares. This represents the proportion of the total variability accounted or explained by the linear regression model. Here an adjusted r squared we have mean square of the error and model total sum of squares are used. And here we penalize the model for having too many parameters r squared adjusted is equal to 1 minus sum of squares of error by n minus p divided by sum of squares of total by n minus 1. Here we are using the mean square of the error and the model sum of squares. So here we are dividing by n minus p when the number of parameters increases then n minus p will decrease 1 by n minus p will increase. So this term will increase and the r squared value will go down. So the suggested procedure is keep adding more parameters to your model until the r squared adjusted starts to decrease. So it starts to penalize the model for having too many parameters. So if the adjusted r squared value also increases along with the r squared upon adding of the parameter then that particular parameter you have added to the model is making a effective contribution. So now how to analyze for lack of fit. So what we do is we take the residual sum of squares. What is the residual sum of squares? It is the residual is defined as the balance or leftover when you subtract the experimental response with the model prediction. So the balance is called as the residue and the residual sum of squares are obtained by summing the square of these residues. So the residual sum of square is split into lack of fit sum of squares and the pure error sum of squares. So the degrees of freedom of residual sum of squares is nr and that is split into lack of fit sum of squares and pure error sum of squares. Pure error sum of squares is having degrees of freedom ne and lack of fit sum of squares is having degrees of freedom nr minus ne. And when we divide the lack of fit sum of squares by nr-ne, we get mean square lack of fit. When we divide pure error sum of squares with ne, we get mean square pure error. Now what we do is we compare the mean square lack of fit with mean square pure error using nr-ne numerator degrees of freedom and ne denominator degrees of freedom. If the F test says that these two are comparable and it does not lie in the rejection region, then we can say that the model does not have any lack of fit because the lack of fit sum of squares are comparable to the pure error sum of squares, there is no further incentive to develop the model further. But on the other hand, if the mean square lack of fit is considerably higher from the mean square pure error, then the F statistic would be lying in the rejection region, then you have to conclude that there is sufficient scope for model expansion and the lack of fit is significant. So you have to consider the addition of more terms in your equation. How do you get the pure error? The pure error is obtained by carrying out genuine repeats in your experimental runs. You fix all the factors at certain value and then repeat at the same value more than once. Then you choose some other set of values for the factors, repeat the experiments more than once at such factor settings. Like this, if you do, you will be able to get genuine repeats which will help you to find the pure error sum of squares and then you will get the mean square pure error. So this is a lack of fit test is very important in linear regression analysis and it helps you to stop at a particular stage of model development. So the next important concept we discussed is about orthogonal designs. We touched upon the advantages of orthogonal designs a few slides back and if the two columns of the design matrix are orthogonal, it implies that the levels of these two factors are linearly independent. The important implication of linear independence is that the roles of the two variables on the process response are being assessed independent of each other. When you are having factorial design, they are orthogonal based designs and when you have factor A factor B, you can see that A factor is treated independent of the B factor. The contribution brought in by the A factor is independent of whether you are considering the B factor or not. Then we talked about AB interaction. How the experimental response goes from one value to another value upon changing A from lower level to upper level depends upon whether B was at a lower level or B was at a higher level. Then the two factors A and B are said to interact but we are talking about orthogonal designs and we say that A effect is found independent of the B effect. So when you have AB interaction, the AB interaction effect is also found independent of the A effect and the B effect. So when you develop a model and you do not consider the interaction term between A and B, the factor A would still have a particular value. Let us say the effect brought in by factor A is 20. If you consider interaction between A and B also, the effect of A would be still 20. So it does not matter whether AB is present in the model or not. The effect of A is computed to be the same. Similarly, the effect of B may be 10. The effect of B is 10 independent of whether A is present in the model or AB is present in the model. Suppose you have a full fledged model and you are considering AB and AB, then the factor B would have an effect of let us say 10 units. If you do not have factor A and factor AB, B would still have 10 units. So this is very important. The best way to understand this is to actually do a problem. Develop a model for the orthogonal case. Find the effects. First you have only A. Then you have B. Then you have AB. Then you will find that A effect, B effect and AB effect are found independent of each other. And in our model development, we have to be careful. Suppose we are having only 4 runs corresponding to a 2 power 2 factorial design. So we are having only 4 runs. This is the column vector of once. This is the column vector corresponding to x1, corresponding to x2 and x1, x2. So you are having all these so 4 independent runs and so you can estimate 4 independent parameters beta hat0, beta hat1, beta hat2, beta hat12. However, if you want to estimate more parameters from your experimental design, for example you want to consider the quadratic terms also in your design, like putting beta hat1, 1 and beta hat22. You will find that the beta hat11 corresponding to x1 squared and beta hat22 corresponding to x2 squared. x1 squared is having all once and x2 squared is also having all once. And if you look at the x matrix, the ones are corresponding with the column vector of once. The columns are not linearly independent anymore. You can see that there are linearly dependent columns. There are 3 columns which are not linearly independent and that would lead to all kind of difficulties in your estimation of the parameters. So it is important that you restrain yourself depending upon the size of the run. You do not try to fit too many parameters in your model. Do not go for the greedy model. Look at number of parameters you want to estimate and number of experimental runs available. You would think that number of experimental runs and number of parameters can be the same. Then it is no longer a regression analysis but a procedure for solving A equations in A unknowns. So you will get a exact fit to the experimental data. Normally we conduct experiments in such a way that we conduct a large number of experimental runs and then we estimate only a few parameters. So that we have sufficient scope for accounting for experimental error and also for lack of fit tests. In certain experimental design strategies, we require center runs. You can do factorial design and you can repeat the experiments at factorial points, factorial design points but that may probably lead to large number of repetition of the runs and they may be also expensive. So rather than doing that, you may want to carry out experiments at the center of your experimental design space. So the repeats are only conducted at the center of the experimental design space which is midway from all the experimental settings. So that center runs are very important because you are able to get an idea about the pure error and it is an important augmentation to the factorial design and it does not contribute to the linear effects and the interaction terms and it helps to see qualitatively whether curvature is there or not. Another important parameter which people do not really understand or use in their design of experiments is the scaled prediction variance. There is a bit of linear algebra associated with it and that may be the reason why people do not really appreciate and utilize it. It is associated with the prediction nature of your model. So your model is going to predict certain values in the experimental design space and how good and what is the quality of the model predictions. If the model predictions have wide variance associated with them, then those predictions cannot be really relied upon. So we want to have a controlled scaled prediction variance for which we have to focus on the experimental design. For finding out the scaled prediction variance, we do not have to conduct the experiments as such. We have to choose upon the suitable experimental design strategy and by looking at the x matrix, we calculate x prime x and then we calculate x prime x inverse and then we identify a certain set of coordinates in that experimental design space and then we compute the scaled prediction variance. And once we have the scaled prediction variance, we look at different points in the domain and see whether the scaled prediction variance is kept under check in most of these points. If there are certain points in the experimental design space where the scaled prediction variance shoots up, then that particular design is not to be recommended. So you can have different definitions for the prediction variances. In the first definition, you have sigma squared and xm prime x prime x inverse xm into sigma squared. Xm is the coordinate point expanded into the model space. Please look at the appropriate slide for this definition. Then you also have the unscaled prediction variance where you divide the prediction variance by sigma squared and you get xm prime x prime x inverse xm. And then you also have the scaled prediction variance where you multiply the unscaled prediction variance by the size of the run. You can artificially make your scaled prediction variance as small as possible by increasing the number of runs. If you want to compare different designs, then you have to put them on a common basis and to do that you multiply by n which is the size of the run. And the scaled prediction variance which is a very important parameter in statistical design comparisons, we have spv is equal to n xm prime x prime x inverse xm. Coming to this next slide, we have what is called as the estimated prediction variance. Please note that we do not know the value of sigma squared, the error variance. In such cases, we try to replace the sigma squared with the suitable error estimate and that would be the mean square of the residuals. The residual sum of squares is divided by degrees of freedom for the residual sum of squares. If you have n exponential points and then you have p parameters including the intercept beta hat not beta hat 1 so on. So you have p parameters, note that p is equal to k plus 1 where k is the number of regression coefficients okay. So you have n minus p as the degrees of freedom for the residual sum of squares. So you can divide the residual sum of squares by n minus p to get the mean square error and that can be used instead of sigma squared. We call it as sigma hat squared to denote that it is an estimated one. And once you get the estimated prediction variance, we can find the square root to get the standard error. We talk a lot about second order models. You look at many research papers. They also talk about second order models. Why should there be so much fuss about second order models? Because the experimental design space may no longer be planar or have only simple interactions. It may also be characterized by peaks and or valleys and second order models are required to estimate this response and enable the identification of an optimal solution if any. So second order models are of the form y is equal to beta not plus i equals 1 to k beta i xi plus sigma i sigma j beta ij xi xj plus sigma i equals 1 to k beta i xi squared plus epsilon. So you are accounting for the main effects. You are accounting for the binary interactions and you are accounting for the quadratic terms. And this would require an estimation of totally 1 plus k plus kc2 plus k which is 1 plus 2k plus k into k minus 1 by 2 parameters. And one important second order design is the central composite design where you have a regular factorial design and it is augmented by axial points. I am showing it for design involving only 2 factors so that I can represent it on a 2 dimensional diagram. So you have the regular factorial points and then you have the center points which help you to find the pure error and also tell qualitatively whether the curvature effects are there. In addition to the center and factorial points you also have the axial points and these axial points are important augmentations for the central composite design. So the role played by center points in the central composite design are it helps in the detection of second order or curvature effects beta 1 1 plus beta 2 2 but not in their individual estimation. The number of center points decides the distribution of scaled prediction variance in the region of interest. So what I am trying to say is the prediction capability of your model may also depend upon the number of center points considered. The axial terms help in the contribution help in the estimation rather of the individual pure quadratic effects and if the axial points were not present only the sum of the quadratic term significance beta 1 1 plus beta 2 2 could have been gauged using the center points. And the axial points also do not help in the estimation of the interaction effects and that is obtained from the factorial points. The center points and the axial points contribute to the flexibility of the central composite design. An important alternative to the CCD is the BBD. An alternative to the central composite design is the box Benken design. It is a creative approach to planned experimentation and involves relatively smaller number of runs. It involves a balanced incomplete block design. So for 3 factors you have the box Benken design. Here you have a regular 2 power 2 factorial for factory AB and C is kept at the center point. In the next phase you leave B at the center point and then construct a 2 power 2 factorial for A and C. For the next phase you consider B and C and then you have A at the center. After you have exhausted all the 3 combinations you then have a set of center points defined here. So when you have large number of factors let us say 6 factors instead of going for a 2 power 2 design for a certain subset of factors you go for a 2 power 3 design. So first block or first phase you consider x2, x3 and x5. Here you constructed 2 power 3 design involving these factors. Then after doing out the 8 settings corresponding to x1, x2 and x4 you go to x2, x3 and x4 carry out the 2 power 3 design. In such a case all the remaining factors would be at the center values. Similarly you go around taking 3 factors at a time and construct all your 2 power 3 designs out of these factors. So the important thing to notice you are considering 3 factors out of the 6 at any given time. So those 3 factors would constitute a 2 power 3 factorial design whereas the remaining factors would be at the center values. Finally after exhausting all the combinations you come to the center runs where all the factors are kept at the center values. And once we know how to do CCD and VBD we can then do the response surface methodology where the objective is to find the optimum value. So the current level of operation may be very far away from the optimum and we cannot afford to wander around in the n dimensional space wasting resources manpower and time. So we need a structured and we need a well thought out procedure to quickly progress towards the optimal solution. For this we use the method of steepest ascent. So I am demonstrating this method for 2 variables. Here you are having x1 and x2 and this is responses are obtained from a 2 power 2 factorial design and they are showing no interactions. The contours are not curved but they are linear and to proceed along the direction of steepest ascent you go in the direction perpendicular as shown here perpendicular to the contour lines. These are response lines we are going in the direction perpendicular to them. So once we keep doing experiments, please remember that we cannot use the develop model to identify the values or outcomes along the direction of steepest ascent but we actually do experiments out of this design space and we keep doing the experiments until we reach a stage where we find the values passing through an optimum. Here we construct a central composite design around this optimal point and then evaluate all the parameters in the model and we also see whether the optimum is a minimum or a maximum or a saddle point. So the central composite design around the optimal point is shown for example here. Here you have the factorial points, here you have the axial points finally the repeats. So now you can fit a second order model as given by the following equation and once you have identified all the model parameters then you have to identify the stationary point. You have to first locate where the stationary point is. The stationary point is where the dou y hat by dou xk where xk can be x1, x2 depending on the number of independent factors you have considered that all the partial derivatives should be set to 0. So from the identified model equation you can set all the partial derivatives with respect to the x values to be 0. Solve the resulting set of perhaps nonlinear algebraic equations and find the set of stationary conditions. There is another way of doing it that is from the matrix method. Here you identify what is called as a small b matrix and the capital B matrix. What is the small b matrix and the capital B matrix? The small b matrix is the column vector of all the main factor parameters and then you have the capital B matrix whose structure is interesting. Here along the diagonal we have the coefficients for the quadratic terms and along the off diagonal we have one half of the interaction term. So this matrix is symmetric for example beta the 1, 2 location will be equal to the 2, 1 location. So here also you have beta hat 1, 2 by 2 and in 1, 2 or 2, 1 location also you will have beta hat 1, 2 by 2. So this B matrix is a symmetric matrix, Bij is equal to Bji that is the matrix being symmetric. In such a case to account for the interaction effect twice we are dividing it by 2. On the other hand the diagonal terms are all the quadratic coefficients beta hat 1, 1, beta hat 2, 2 and beta hat k, k. So once you have the capital B and the small b matrices evaluating the location the stationary point is quite straight forward. This is obtained by the solution to the equation B plus 2 Bx is equal to 0. Remember here we are dealing with matrices. So once you solve the above equation we get the stationary point coordinates as minus half B inverse of B. So the predicted value of y at the stationary point is y hat of s is equal to beta hat naught plus half of xs prime B where xs is given by this equation. So you can look at the response surface and it can see for this particular example it passes through a minimum. So we have a minimum solution here. To be in order to identify whether the identified stationary point is maximum or minimum or a saddle point we have to check the eigenvalues. If all the eigenvalues are positive then the obtained solution the stationary point corresponds to a minimum. And if all the eigenvalues are negative the identified stationary point is a maximum. And if you have some eigenvalues positive and some eigenvalues negative then it corresponds to a saddle point and not a true optimal location. So the CCD and BBD runs are augmented with center runs. They are very popular among the practitioners and the statistical design experiments is evaluated based on factors such as good fit to the data. It should allow for lack of fit and it should allow for sequential construction of models of increasing order or complexity. And you should have enough repeats at the center to have an estimate of the pure error and it should be robust to the presence of outliers in the data. And it should be cost effective that means it should involve less number of runs and it provides us good prediction of scaled prediction variance. So this completes our discussion on experimental design strategies. We have covered quite a lot of ground that should be enough for researchers who are planning to do design experiments. We have covered sufficient theory as well as done significant number of problems. I request you to go through them and solve as many problems on your own and then also become familiar with statistical software like Minitab, design expert and so on. So I wish you all the best in your application of the statistics principles we learnt for your experimental program. Thanks for your attention.