 We will be continuing with our regression analysis, we are currently doing example set 8 and we will continue with the problem we were discussing. So we know that the residual sum of squares is an important component in the error analysis as it is tied down with the prediction capability of the developed regression model. So higher the residual sum of squares, higher would be the deviation between the experimental data and the model predictions. So we have the residual sum of squares as y prime y – beta hat 1 prime x1 prime y for this present case, x1 refers to the first regressor variable and the total sum of squares y prime y is 494813.74 transpose of y multiplied by y. So beta hat 1 prime y based on the matrix will give you 456196.4. So the difference between y prime y and beta hat 1 prime x1 prime y is 38617.34. So whenever we calculate sum of squares we also identify the scaling factor or in other words the degrees of freedom. For the present case the degrees of freedom would be n – p, we have 2 parameters. So 13 – 2 is equal to 11 degrees of freedom. The 2 parameters are beta 0 and beta 1, beta 0 corresponding to the intercept and beta 1 corresponding to the regressor variable x1. So the mean square error would be sum of squares of error by n – p. So we divide the residual sum of squares which we found out as 38617.34 and that we divide by 11, we get 3510.67. Now we will discuss about the calculation of the pure error. So we are trying to do something different here. So what we do is when we ignore the mass of the powder it is as of the experiments are being repeated. So the experimental settings corresponding to the mass of the powder are not taken into consideration and so it is as if we are doing some repeats. I request you to look at the table and confirm this for yourself. So we are having the mass of powder here and when we ignore the mass of the powder then for example if you look at the 333 and 666 we are not caring about the mass of the powder and we have 30, 40, 50, 30, 40, 50. So it is as if we are repeating the experiments at 30, 40 and 50. Same thing for mass of powder of 9 grams. So in this case we are having 45 and 35 but only 2 settings and then we also do at 45 and 35. So if you ignore the mass of the powder then it is as if certain experiments have been repeated twice and certain other experiments have been repeated 3 times. For example this 30, 40, 50 is one set of conditions again 30, 40, 50, again 30, 40, 50. So we have 3 repeats. When you do the experiments at 45 and 35 it is as if we are having 2 repeats only. So with this background let us go back to the regression analysis. The pure error has to be estimated under this artificial situation where we do not account for the mass of the powder. Please note that as of now we do not know whether the mass of the powder is having a strong influence on the response of the process but we will just for demonstration purposes assume that the mass of the powder can be ignored and thereby we are having some repeats. So we have to calculate the pure error sum of squares. There are couple of ways of doing it. One way is to take sigma i is equal to 1 to 5, 5 independent settings and then you have j is equal to 1 to m and where m is the number of repeat observations for every independent setting and then you have y ij-y bar i whole square by n-n where capital N is the total number of runs which is 13 and n is the number of independent settings. So you get after this calculations over 4827.1. The number of repeats as we just now saw may be either 2 or 3. There is another way for calculating the pure error and that is given by sigma i is equal to 1 to 5 mu y si square by n-n. For every independent setting you can calculate the si squared and then weight it with the appropriate degrees of freedom okay. So I request you to carry out this calculation for yourself and see whether your answers match with mine. So you have the case where you had 3 repeats. You have 2 degrees of freedom and for that we had 3 independent settings and then you have 2 cases where you had 1 degree of freedom because there were only 2 entities in that independent set. So you have 1 and 1. Let me come again. So here i is running from 1 to 5 mu y si squared by n-n. So what you do is you calculate the standard deviation or the variance for every independent data set okay. You have 5 independent data sets. So for each data set you find out the variance si squared for that particular data set and then you weight it with the degrees of freedom. For the first independent data set we had 3 runs and so you have 2 degrees of freedom. For the second one also you have 2 degrees of freedom. For the third one you have 2 degrees of freedom. Then you had only 2 runs for the fourth data set and only 2 runs for the fifth data set and both of them would hence have only 1 degrees of freedom and then you have the total degrees of freedom as 2 plus 2 4 plus 2 6 plus 1 7 plus 1 8 and that also leads to 4827.1. So you have 4827.1 by using this method and you also have the same answer by using a different method. So we are now interested in finding the lack of fit sum of squares. We want to see whether our decision to exclude factor 2 from the modeling analysis was justified. So you have the residual sum of squares as a combination or a additive combination of lack of fit sum of squares, sum of squares lack of fit plus pure error sum of squares. This is a very important component and we are going to compare the lack of fit sum of squares with the pure error sum of squares. So you have 38617.34 and then you just now calculated the pure error as 38616.75 sum of squares as 38616.75 and so the difference between these 2 will give you a lack of fit sum of squares of 0.59. So next we go on to the regression sum of squares. The regression sum of squares after removing the influence of the intercept beta not is given by beta hat 1 prime x1 prime y minus sigma is equal to 1 to n yi whole square by n or this is also equal to n y bar square. We have covered this in one of our previous lectures. So we have the sum of squares of regression as 252. So we set up the ANOVA table and so you have regression residual lack of fit and error. The sum of squares whatever we calculated previously we enter into this table 252 38617.34 0.59 38616.75 and so you have the degrees of freedom as 1 for the only parameter we are interested in that is the parameter beta 1 and then you have 11 degrees of freedom for the residual we saw it as n minus p where n is the total number of data points which is 13 and then p is the total number of parameters beta not and beta 1 which we are considering now. So 13 minus 2 is 11 and then you have the lack of fit sum of squares and the pure error sum of squares. We saw in our calculation that the errors degrees of freedom was 8 let me just go back. So you can see that it is 13 minus 5 2 plus 2 4 plus 2 6 plus 1 plus 1 8. We also had this formula n minus n 13 minus 5 total number of runs 13 minus number of independent settings 5 and so you have 8. So you can calculate the mean square and you find the regression mean square residual mean square and lack of fit mean square and pure error mean square. To find these all you have to do is divide the sum of squares by the degrees of freedom and you will get the corresponding f values and you can see that the p values are pretty high. So it means that the parameter beta 1 is pretty much insignificant okay. So it means that the x 1 is actually contributing little to the model. Let us move on. Let us discuss this further. So since the p value is pretty high the parameter corresponding to x 1 let me make a small correction here. The parameter associated with the x 1 is not at all significant and may be rejected and the regression sum of squares is very small and the residual sum of squares is pretty high. So this is a kind of very artificial situation where we have removed the effect of the mass of the powder which probably is having an important say in the process. So what we did was to artificially create the repeats and hence we had a very high value of the pure error sum of squares. Since the pure error sum of squares was very high it sort of made the regression sum of squares corresponding to the parameter beta 1 pretty much insignificant. And also we note that the sum of squares of pure error is not genuine. So rather than being lazy and not conduct repeats it is important that we conduct genuine repeats these are very essential and we cannot a priori or beforehand assume the insignificance of certain variables. So next step is to demonstrate the extra sum of squares approach. We have the truncated model in blue and the full model available to us. The truncated model is given by beta hat 0 plus beta hat 1 x 1 and the full model is as of now involving only the main variables x 1 and x 2. So we have the full model as of now written down as c hat is equal to beta hat 0 plus beta hat 1 x 1 plus beta hat 2 x 2. We can of course develop the model by adding more and more terms like the second order interaction terms, the third order interaction terms, the quadratic terms and so on. But as of now to demonstrate the principle of extra sum of squares we will stick with the model which is having only the main effects. We are now finding the sum of squares of regression due to the beta hat 1 and that is given by beta hat prime 1 x prime 1 y minus n y bar squared or sigma is equal to 1 to n y i whole squared by n that comes out to be 252. So this we saw earlier. Now we want to see the sum of squares of regression due to beta hat 2 the second parameter the mass of the powder given beta hat 1 and that is given as beta hat prime x prime y minus n y bar squared minus beta hat prime 1 x 1 or rather x prime 1 y minus n y bar squared. So you can either say it as sigma is equal to 1 to n y i whole squared by n or n y bar squared. Small n is the number of data points and this particular summation is the sum of all the responses and then you square it. This can be shown to be equal to n into the average of the response squared okay. So what we do is we are subtracting these terms in order to remove the influence of the intercept beta naught. This also we have seen earlier. Now we are interested in finding out the sum of squares of regression due to beta hat 2 given that beta hat 1 is already present in the model. So we take the total regression sum of squares including all the parameters or the full model and then we subtract it with the regression sum of squares due to the model having only beta hat 1 okay. In both the cases we are moving the influence of the intercept. So in this full model also you have the intercept. In the truncated model also you have the intercept. This we also saw even in the truncated model involving only x 1 you have the intercept parameter and even the full model also the full model also you have the intercept beta hat naught okay. So when we do that we essentially remove the effect of beta hat naught from both the full model as well as the truncated model. So this difference is the extra sum of squares brought in by the parameter beta hat 2 and that comes out to be quite significant. You can see that it is adding up to this term is 35043.74 and then we already found this term to be 252 and this comes out to be 34791.74. So the sum of squares of regression beta hat 2 given beta hat 1 is also termed as the extra sum of squares due to beta hat 2 that is by including the second variable. This is the increase in the regression sum of squares due to including the variable x 2 in the model. It is also independent of mean square error. So now we can test the null hypothesis beta 2 is equal to 0 according to the following relationship. F naught is sum of squares of regression beta hat 2 given beta hat 1 is already present in the model divided by the degrees of freedom due to beta 2 which is nothing but 1 and then we divided by the mean square error and we get 34791.74 divided by 382.56. So you probably may want to see how you got the mean square error of 382.56 and this value comes out to be 90.944. So the p value associated with this F naught is pretty much close to 0 and hence the null hypothesis may be rejected. Hence the variable x 2 creates a significant contribution to the process. Now let us move on to the sequential model building. Here we build models sequentially and indicate the important effects. So you can add many terms to your already existing model but please make sure that you have adequate number of experimental data points. If you have only 5 data points then you are pretty much saturated with this particular model. So what I am trying to say is make sure that you have adequate degrees of freedom when you are calculating the regression sum of squares. So you can see that this is one option even though we are not going to look at this option, we will be looking at an addition to the original model having only the main effects. What is that addition? That addition is simply adding beta hat 1 to x 1, x 2. So we are now only going to consider the effect of the interaction between x 1 and x 2. So we will be seeing the effect of adding beta hat 1 to x 1, x 2 and our motivation or our aim is to build a model sequentially and indicate whether the additional term is important. So possibility of additional terms in the model is many and here we are considering the second order interaction between 1 and 2 regressor variables. So when you want to do this in the matrix notation, we add a column vector of x 1 and x 2. You have a column vector of settings of x 1, you have a column vector of settings of x 2. So all you have to do is multiply the 2 settings x 1 and x 2 and create a new column vector and this is going to be an additional column vector in your overall x matrix, okay. So we then treat it as a new model involving x 1, x 2 and find its regression coefficient, okay. So when you want to find the effect of adding beta hat 1 to x 1, x 2, what you can do is you can forget the presence of the other terms. Essentially you are considering a model which is having only beta hat 1 to x 1, x 2. You may ask what happened to the other models, is this correct? It is as if you are ignoring the presence and contributions of the other 3 entities in the present model and you are only considering beta hat 1 to x 1, x 2, okay. There is a rider or a condition to this way of doing the regression problem. The condition is it should be an orthogonal design, okay. So our x matrix which we are going to work with is simply the column vector x 1, x 2. We are not having the columns of 1, we are not having the column corresponding to x 1 settings and we are not having the columns corresponding to x 2 settings. We are only having the column corresponding to x 1, x 2 and you can see that we are having these values and you can see that when you add this column elements they add up to 0. So you have the y column vector simply the same as before it is the column vector of responses to the experiments at various settings. So we can now do the usual matrix manipulations. We can find the x prime x matrix in our case the present case it is x 1, x 2 prime x 1, x 2 that is coming out to be 4.25 and then x prime y or x 1, x 2 prime y is minus 127.5 and you do the estimation of the beta hat 1, 2 and that is x 1, x 2 prime x 1, x 2 inverse x 1, x 2 prime y. Let me make a small correction here. So here you go we know that the parameter or parameters are obtained by x prime x inverse x prime y that is a general formula in matrix approach to linear regression. In our present case the x matrix is x 1, x 2 so we have x 1, x 2 prime x 1, x 2 inverse of the entire product of the 2 matrices and then you have x prime y which is nothing but x 1, x 2 prime y. So when you carry out this calculation you will find that beta hat 1, 2 is minus 30. So the sum of squares of regression due to beta hat 1, 2 given that beta hat 0, beta hat 1 and beta hat 2 are already present in the model is simply given by beta hat 1, 2 x 1, x 2 prime y which comes out to 3825. This is due to a contribution from x 1, x 2 and now we have to add the contribution from the regression sum of squares due to other regression questions including the intercept beta hat 0 to get the regression sum of squares for the full model and the degrees of freedom for the mean square error is 13 minus 4 which is equal to 9 and there are 4 parameters in the full model okay. So the error sum of squares is y prime y minus beta hat 1, 2 prime x 1, x 2 prime y plus beta hat prime x prime y okay. This is the sum of squares regression sum of squares brought out by the regressor variable x 1, x 2 and this is the regression sum of squares brought out by the original model involving only the main effects x 1 and x 2 okay and this included beta 0. So the regression sum of squares associated with this beta hat prime x prime y corresponds to the model beta 0 plus beta 1 x 1 plus beta 2 x 2 okay only the main effects model. Then we are adding the x 1, x 2 interaction term to the only main effects model okay. So now this represents the full model including the parameter beta 0, beta 1, beta 2 and beta 1, 2. Beta 1, 2 is the coefficient to x 1, x 2 regressor variable. This is the total sum of squares. This is the total regression sum of squares and so you get the sum of squares of error as 0.59. I request you to carry out these calculations independently and see if your answers are matching with mine. So the null hypothesis beta 1, 2 is equal to 0 may be tested by the statistic f not is equal to sum of squares of regression beta hat 1, 2 given beta hat not beta hat 1, beta hat 2. Only one parameter independent parameter is being tested 1 degree of freedom and then you have the mean square error which is the sum of squares of the error divided by the error degrees of freedom and that comes out to be 58347.5. This is obviously very high f value and hence the parameter beta 1, 2 is significant. Please find the p value for this particular case corresponding to 1 and 9 degrees of freedom. The next analysis will be looking at the data if they had not been coded. For some of us it would seem like coding is a pain and it is better to directly get a model without coding the data. Let us see what would happen in such a scenario. So this is the actual data. We have seen the response to be obtained by varying temperature and mass of powder. So this is the complete data set and so what we will do is consider a linear regression model involving only the main effects only. We are just now doing this for demonstration purposes. We can of course do it for a larger model or a model with more terms but as of now let us consider it for only a model with main effects alone. The main effects are temperature and mass of the powder. So let us call it as C hat is equal to beta hat 0 plus beta hat 1 T plus beta hat 2 M. Let me correct the typo. So these are regressor variables temperature and mass of the powder expressed as they are. You can see that the mass of the powder is in the range of 3 to 7.5 whereas the temperature is in the range of 30 to 50. So an order of magnitude difference is there between the two factors. In other experiments this difference between the factors can be quite significant. For example some factor may be ranging only between 0 and 1 whereas there may be other factors which are running into the order of hundreds or thousands. So you can see that there is going to be a big order of magnitude difference between the various factors. In such a situation it is better to do scaling but in the present exercise we will not do the scaling or coding and making the values range only between minus 1 and plus 1. We will take the values as they are and see what happens to the model. So we have the X matrix for the coded case. You can see that all of them are in the same range between minus 0.5 to plus 1. So it looks pretty neat here but when you look at the uncoded X matrix this is the matrix, the column vector of once and this is the temperature column and this is the mass of powder column. So you can see that the values are differing by about 10 and this is the response vector. We do not really code this response vector we just take it as it is. And so what we are now doing in this slide is to compare the coded and uncoded results. So you have X prime X matrix which is a nice diagonal matrix easy to calculate and easy to invert whereas the uncoded case X prime X matrix is quite ugly and you cannot easily invert it here. To find the inverse I can do it with my eyes closed. I will just put 1 by 13, 1 by 7, 1 by 7 here it is not that easy especially if you do not have access to a computer you will have to do the inverse calculations on your own. And these numbers are also looking pretty much different and the off diagonal terms have not vanished and the off diagonal terms are also present and they are it is a symmetric matrix aij is aj i okay and you can see that there is a big order of difference between the values. So in the coded case the parameters were estimated to be 187.27 minus 6 and minus 70.5 whereas in the uncoded case 352.277 minus 0.6 and minus 23.5 and these values are pretty much different okay. So when you do linear regression you have to see how the experimenter has treated the data whether he has coded it and got the parameters or whether he has used the data in the raw form and you cannot expect the same model to be obtained in the coded and uncoded cases. So you have the x prime y matrix and here also you have the x prime y matrix they also look quite different and the parameters are obviously again different. So it was seen that beta hat 0 is 187.27 and the coded case which was also the average of the responses okay but it is not so in the case of the uncoded case. So the average of the responses is same in both the coded and uncoded cases there is no doubt about that but the parameter beta hat 0 in the coded case matched to the average of the responses but in the uncoded case it did not and you have to present the variance covariance matrix. This also we have seen we know that the variance covariance matrix the variance of the parameters are given by the diagonal terms multiplied by the sigma squared whereas the covariances between the parameters are given by the off diagonal terms and this is a symmetric matrix where C01 is same as C or rather C12 is same as C21 actually we will call this matrix as V. So V12 will be equal to V21, V13 will be equal to V31 and so on and that is why I have put C01 and C01 here, C02 and C02 here and C12 is matching with C12 here. So it is a symmetric matrix. So we have the variance given by the diagonal elements multiplied by sigma squared and the covariance given by the off diagonal elements multiplied by sigma squared where sigma squared is the error variance. So this is the variance covariance matrix for the coded case as I said earlier it is a nice diagonal matrix the off diagonal terms are vanishing there is no covariances between the estimated parameters which is a big plus whereas in the uncoded case you can see that there is a covariance between the estimated parameters. So now we have to calculate the error sum of squares. So that is because we usually use the error sum of squares as a surrogate for sigma squared. So the error sum of squares is nothing but y prime y minus beta hat uncoded prime x uncoded prime y. The y prime y is the same as it was in the coded case you are simply squaring the responses and adding them up and then you also do beta hat uncoded prime x uc prime y and that comes out to be 490988.15 this is for the model including x1 and x2 in the main effect form only we do not consider x1 x2 in this model in this uncoded case and this 490988.15 is same as in the coded case. So whether you do coding or not coding the regression sum of squares and the total sum of squares and the error sum of squares do not change okay. So you have sum of squares of error is 3825.593. The degrees of freedom is n minus p 13 minus 3 which is 10 and so the mean square error is 382.56. So an estimate of the error variance is given by the mean square error which is sigma hat square we take it as 382.56. So we get sigma hat which is the error standard deviation estimate as 19.56. So this completes our discussion on the example set 8 for now and the important thing to understand here is the extra sum of squares concept and the matrix approach to linear regression. You can see that the matrix approach linear regression made everything elegant compact and easy to execute especially for large data sets. It is recommended that you do the calculations pertaining to this example problem on your own and compare your answers with mine. This will give you good training and also make you confident in analyzing problems that may crop up in your own industrial experience or when doing research. Thank you for your attention.