 Hello, in today's class we will be looking at the regression concepts through the working of an example problem. The references are the book written by Draper and Smith titled Applied Regression Analysis Cutner et al applied linear regression models, fourth edition McGraw-Hill and then the prescribed textbook Montgomery and Runger applied statistics and probability for engineers, fifth edition and Montgomery's design and analysis of experiments. The problem statement goes like this, a new washing machine prototype is developed by a company, its unique design enables efficient removal of dirt but at the same time it is felt that the color of the cloth also gets excessively removed during the washing. So the washing machine is subject to some trials where in the color in the washed liquid C is analyzed using a photometer, the variables considered while washing with good quality water are listed as follows, temperature of the water X1, amount of detergent powder used X2, the washing time is set to a standard 40 minutes cycle. The data collected are given in the form of a table below, we have to develop the model, the model is not given to us and if you look at the table, I will first describe, you can see the temperature of water is kept either at 30 degrees or 40 degrees or 50 degrees. In the first 9 readings, the amount of powder used varies between 3 grams to 9 grams. The data table continues and we have temperature at 45 degrees centigrade and 35 degrees centigrade, mass of powder is 4.5 grams and 7.5 grams. The concentration in PPM is also given in the last column. Let us now go to the questions. Before we look at the questions, since the temperature was varying between 30 to 60 degrees centigrade and the mass of the powder was between 3 and 9 grams, there is an approximate one order of magnitude difference between the 2 variables. So it is better if we code the variables. So the variables are coded in the range between minus 1 to 1. So we have to consider a linear regression model involving the main effects only, write down this model, show how the parameters are obtained, present the variance, covariance matrix, construct the ANOVA table, explaining the different calculations, explain how you obtained R squared and adjusted R squared, is there any lack of fit in the model, demonstrate the extra sum of squares approach, build the model sequentially and indicate whether the additional terms are important, show the results for the final model if the coding of the variables had not been done. So we saw the data already, this is the actual data and when you express the data in the coded form, we get a table like this. The lowest setting of temperature is kept at minus 1, the lowest setting of powder used, the weight of powder used rather is also set at minus 1 and the maximum value of temperature is set at plus 1, so all these are minimum and the maximum amount of powder used, which is 9 grams is coded at plus 1. So the intermediate setting is coded 0, similarly the intermediate powder loading in the machine which is 6 grams is coded at 0, we do not touch the colour in the liquid, we keep the data as it is and since we are also doing additional 4 runs at intermediate settings at 45 degrees gets a coding of 0.5 and 4.5, let us see what is minus 0.5 correspond to that is 4.5 grams, so that is coming as minus 0.5 grams, sorry minus 0.5 in the coded format and 7.5 grams is coming as plus 0.5, so 7.5 minus 6 is 1.5 and 9 minus 6 is 3, so 1.5 by 3 is 0.5. So this is how we have coded the different variables or the factors, why do we do the coding, the raw data was varying by about an order of magnitude, so the coefficient in the regression model associated with this raw data would be high and the regression coefficient associated with this raw data of temperature would be correspondingly low and this kind of order of magnitude difference may actually increase, in other words one variable would be in the order of thousands and the other variable would be in the order of fractions, so the difference between the two would be very considerable and the regression coefficients also would correspondingly adjust. So this may lead to wide variation in the estimated regression coefficients, the regression coefficient with the higher number may be in the order of 10 power minus 2 or 10 power minus 3 whereas the regression coefficient associated with the variable which is having very low values would correspondingly be quite high in the order of 10 power 2 or 10 power 3 and this may lead to a kind of an unwieldy model. The problem is also increased if the units are changed by mistake, so this becomes a dimensional equation and when you have a dimensional equation the units are very important. Suppose somebody by mistake puts Kelvin instead of centigrade then the regression would give wrong answers, similarly if the powder by mistake the person puts milligrams then again the wrong results would be obtained but once you do the coding properly and the user is warned that the variables are coded then the problem is solved, it is another advantage of coding, the regression variables become independent of units in the coded format. So let us look at the regression analysis involving the main factors only, so we write down the model, the predicted color is given in terms of beta0 hat plus beta1 hat x1 plus beta2 hat x2, just correct the typo here, this is a very important term in the regression model, it gives the value of the concentration when both x1 and x2 are 0, there is more to this beta0 hat which we will see in the coming slides. So the next part of the question is show how the parameters are obtained, we first express the data in the matrix form which I will show in the next few slides and once the x prime x matrix is set up then we take the inverse of the x prime x matrix and we also take the x prime y matrix and we pre multiply the x prime y matrix with the x prime x inverse and we get the vector of the estimated parameters. So we have the x matrix, the first column is the column of 1 and the second column is having minus 1 1 minus 1 1 minus 1 1.5 minus 0.5 0.5 minus 0.5 and the third column is having data like this. So all of them are coded and the y matrix represents the concentrations recorded at each experimental setting. So when you add up the elements of the first column we will get the number 13 because there are 13 settings in the experimental program 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13. So we have 13 elements in the first column and we also have 13 observations here in the y column vector. This shows that 13 experiments have been carried out. So when you take x prime x, x prime here would correspond to changing columns into rows and rows into columns. So that would be the transpose of x. When you take x and pre multiply it with the transpose of x, x prime, we get x prime x and that comes to nicely 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. So this is very nice matrix because there are no off diagonal elements. The off diagonal terms are all 0 and inverting this matrix is a piece of cake. A matrix inverse is so defined that when you multiply a given matrix with its inverse you should get the identity matrix. So we obtain the inverse of this x prime x matrix by taking 1 by 13, 1 by 7 and 1 by 7. You also have the x prime y matrix which is given by these 3 row elements and we can also find out the beta hat, the parameters for the regression model which is x prime x inverse x prime y and we get beta hat as these numbers. I request you to do these calculations on your own and make sure that the answers I am getting are matching with your answers. And what is the significance of beta not hat when the experimental observations are put in the coded format, the experimental settings are rather put in the coded format. We are not putting the response in the coded format, only the experimental settings are put in the coded format. We carry out the regression parameter estimation and we get the different parameters. We get beta not hat as 187.27. This means that the average of the responses is 187.27. This is not true for the uncoded case. Please note the important distinction when we code it the way I have shown, we get the average as the beta not hat, that is the average response. Now you can present the variance covariance matrix V in the following form. This we saw in one of the previous lectures and the variance of the regression parameter or the regression coefficient is given by the diagonal terms of the variance covariance matrix multiplied by sigma squared. So variance of beta hat j is equal to Cjj sigma squared and then we have also the covariance between two regression coefficients or regression parameters beta hat i and beta hat j is given by Cij sigma squared. So beta hat 0, or beta hat not, beta hat 2, i is equal to 0 and j is equal to 2. So we are looking at the Cij C0 2 and C0 2 into sigma squared, that gives a covariance between the two parameters. And you can also see that C0 2 will match with C0 2 here. This should be actually written as C0 2 and this is C2 0. If you number the rows from 0, 1 and 2 but the matrix is symmetric, the variance covariance matrix is symmetric and so C0 2 will match with C2 0. Similarly C0 1, C0 1 here will match with C1 0. So they are pretty much written as C0 1 itself and so on. So the important thing to note here is the diagonal elements of this matrix give the variances of the appropriate regression parameters whereas the off diagonal terms give the covariance between a pair of regression parameters. So now we can present the variance covariance matrix in terms of numbers. We found x prime x inverse as 1 by 13 0 0 0 1 by 7 0 0 0 1 by 7 sigma squared. But we hit a roadblock here, we are unable to get a value for sigma squared because sigma squared is not provided. So we have to do with the best estimate of sigma squared. So we will use an estimated sigma squared. The sigma squared of course refers to the error variance which is assumed to be constant. The errors are assumed to be normally distributed with 0 mean and constant variance sigma squared. So that sigma squared is known to us in many situations. So we have to have an estimate of this sigma squared. First let us find the residual sum of squares. We use the mean square residuals as surrogate for sigma squared. So to find the mean square residuals, we need to find the residual sum of squares, divide the residual sum of squares by the degrees of freedom for the residual sum of squares and we get the mean square residuals. So we have y prime y is equal to 494813.74. How did we get y prime y? We go back to the y vector here, we take the transpose of this vector. You can visualize that all these column elements becoming row elements when you take the transpose of y. Then you multiply y prime with y. What will then happen is you are essentially finding out 234 squared plus 257.5 squared and so on. So all the elements in the response vector are squared and added to get y prime y. You may verify this by doing the calculations yourself. So that takes care of y prime y. The residual sum of squares we saw is given by y prime y minus beta hat prime x prime y. And sometimes the residual sum of squares may also be called as the error sum of squares because that represents the deviation between the experimental value and the model prediction and that represents the error. We also distinguish this residual sum of squares from the pure error sum of squares that we saw in the previous lecture. Pure error is obtained from repeated measurements. So we have beta hat prime x prime y as 490988.15. What is that beta hat prime x prime y? We estimated the parameters beta hat, we take the transpose of this and then we multiply with x prime y matrix and that gives us the regression sum of squares. It is important to note that this regression sum of squares includes the contribution from the beta not regression coefficient. So we subtract y prime y with beta hat prime x prime y, we get sum of squares of the residuals as 3825.59. The degrees of freedom for the mean square error or mean square residuals is n minus p which is 13 minus 3. You have 3 parameters estimated from the regression. You have 13 experimental observations. So we have 10 degrees of freedom for the mean square error and the residual sum of squares we saw from the previous slide as 3825.59 and that you divided by 10 the degrees of freedom attached to the residual sum of squares and we get 382.56. So an estimate of the error variance given by sigma hat squared is the mean square of the residuals which is 382.56 or sigma hat is equal to 19.56 and now we can estimate the variance of the different regression parameters because we have plugged in instead of sigma squared we have used sigma hat squared and sigma hat squared was 382.56. Now we can easily estimate the variance of the individual regression parameters. There is no problem with the covariance systems because they are all identically 0. So once we have this 382.56 divided by 13 is approximately 30 and that is what we have here because 13 into 30 is 390 and then the variance of beta hat 1 is 382.56 divided by 7 which would be 54.65 that is fine and variance of beta 2 hat is also 54.65 because these 2 elements are identical and covariance between a pair of regression parameters is 0. Once we get the variance of the different parameters we can find the standard error of the different parameters and we take the square root of these variances and we get 5.425, 7.4 and again 7.4. We can compare these estimated standard errors with beta 0 hat, beta hat 1 and beta hat 2, beta hat 0, beta hat 1, beta hat 2. So we can see that these values are quite okay except for this particular term here the parameter is minus 6 and here we have 7.393 that is a cause for worry. As far as the first regression coefficient is concerned the others are looking okay. It is about 1 tenth of the estimated parameter value. So we can construct the ANOVA table explaining the different calculations. So for doing this we need the total sum of squares, regression sum of squares and the error sum of squares. The total sum of squares by now should be very easy for you to find it is simply the deviation of Yi with respect to the average value of the responses that is Y bar we take the deviation square them, we take every deviation, square every deviation and then add them up. So we get i is equal to 1 to n Yi minus Y bar whole square where n is the number of experiments performed which is 13, Yi is the experimental response and Y bar is the average of the experimental responses in this case it comes to 187.277. So we total this number and we can also write Yi minus Y bar whole square as Y prime Y minus i equals 1 to n Yi whole square by n. Actually you can show this term upon expansion as sigma i equals 1 to n Yi square minus n Y bar square. Let me just use the board to expand it and then show what happens and why we get this particular term. So we get this particular expression without any problem, main thing is you may wonder how we got Yi square minus n Y bar square that is again very straight forward we expand it, sigma i is equal to 1 to n this is one half of the story and this would be Yi square minus 2 Yi Y bar plus Y bar square and this becomes sigma Yi square i equals 1 to n minus 2 sigma Yi Y bar plus n Y bar square that is because we are adding a constant term n times and this becomes sigma Yi square minus this can be written as n Y bar plus n Y bar square and so that is what we get here. So the total sum of squares becomes 38869.34 sometimes you may see different value for total sum of squares you may see the value as 494813.7 and that you may calculate as the total sum of squares but in some places the total sum of squares may be reported as 38869.34. Reason is we are having an actual total sum of squares which is Y prime Y and after we adjust the total sum of squares for the beta hat not then the sum of squares becomes the lower value. So to account for beta not parameter we subtract the actual total sum of squares with n Y bar square we know that if beta not is the only regression coefficient considered then that will become an average of the responses. So beta not will take on the value will take up the value of the average of the responses and the sum of squares contribution from beta not would be n Y bar square where n is the number of experiments and to account for beta not we subtract n Y bar square from the actual total sum of squares Y prime Y and that is why we get 38869.34. So we are removing the effect of beta not from our regression analysis. Similarly for the regression the total regression sum of squares is given by beta prime hat X prime Y and to remove the regression contribution from beta not hat we again subtract n Y bar square and the sum of square of regression due to parameters beta 1 hat and beta 2 hat is given by 35043.74. So this is where you have to be careful you have to see in which situations beta not hat is present. Now we are coming to regression sum of squares previously we were looking at total sum of squares. In the current analysis we are removing the effect of beta hat not. The total sum of squares we removed n Y bar square to remove the effect of beta hat not from the total sum of squares. Similarly we also have to remove the influence of beta hat not from the regression sum of squares. We know that the regression sum of squares is given by beta hat prime X prime Y. To remove the influence of beta hat not we simply subtract n Y bar square again from beta hat prime X prime Y and the sum of squares of regression becomes 305043.74. Now we can write down the different contributions in the ANOVA table and we have the regression source of variation excluding the effect of beta not as 35043.74. The degrees of freedom corresponds to beta hat 1 and beta hat 2 both of which are independent and so we have 2 degrees of freedom here and the mean square is obtained by dividing the sum of squares with the degrees of freedom. The residual sum of squares is also computed. The residual sum of squares is given in the second row. It is the difference between the total sum of squares and the regression sum of squares. If you are computing the total sum of squares without removing the contribution from beta hat not then you should also compute the regression sum of squares without excluding the effect of beta hat not. On the other hand if you are computing the total sum of squares by excluding the effect of beta hat not then from the regression sum of squares also you should remove the effect of beta hat not. So it is a question of whether you are subtracting n Y bar square from the total sum of squares and regression sum of squares. If you are subtracting it from the total sum of squares you also subtract it from the regression sum of squares. The difference between the regression sum sum of squares and the total sum of squares or rather it is other way around. The difference between the total sum of squares and the regression sum of squares will give you the residual sum of squares which in this case is 3825.59. The degrees of freedom for residual sum of squares would be n-p where n is the number of external observations which is 13 and p is the total number of parameters beta hat 0, beta hat 1 and beta hat 2. So we have 3 parameters p is equal to 3 and so n-p will be equal to 10, 13-3. So 10 and so the mean square is obtained by residual sum of squares divided by 10. We get 382.56. So we have the total sum of squares excluding beta 0 based on adding these 2 and we have the total degrees of freedom which is 12. We had 13 data points since we excluded the effect of beta 0 regression coefficient, we have 12 degrees of freedom. We have the residual sum of squares given as the difference between the external observation and the predicted values and that is squared and added. So every experimental observation is subtracted with the corresponding predicted value that deviation is squared and all such deviations are added to give the total residual sum of squares and that may be shown to be equal to y prime y-beta hat prime x prime y and here we are subtracting n y bar squared in both these terms and that is also another way of finding the residual sum of squares where you are correcting for beta 0 parameter. Anyway it does not matter with the residual sum of squares and we get 382.56 whether you add and subtract this n y bar squared or do it directly but one thing you have to be very careful is when you are calculating the residual sum of squares you cannot use the total sum of squares uncorrected for beta 0 and regression sum of squares corrected for beta 0. Then you will get a wrong residual sum of squares. So you correct total sum of squares for beta 0 you also correct regression sum of squares for beta 0 the difference between the 2 will give you the correct residual sum of squares or you take the total sum of squares directly you take the total regression sum of squares the difference between the 2 will give you the residual sum of squares. So correcting for one term for the beta 0 and not correcting the other term for beta 0 will lead to the wrong estimate of residual sum of squares. So this is to be carefully kept track of. So now we have the numbers in the ANOVA table. We have the source of variation, regression and residuals. Sum of squares are also noted here and the degrees of freedom are given here and the mean square values are put here. So the regression sum of squares is 35043.74 that we saw previously 35043.74 and the residual sum of squares is 3825.56. Another thing again I am warning you here is the regression sum of squares is excluding the contribution from the beta 0 parameter. So that is what we did when we calculated the regression sum of squares. And the residual sum of squares are given here. We have 2 parameters since beta 0 is excluded. We have parameters beta 1 and beta 2. So you have 2 degrees of freedom here and you have 10-p degrees of freedom which is 13-3 and that is 10 and the mean square is obtained by dividing the sum of squares with degrees of freedom. So you get 17521.87 and 3825.56. The ratio of these 2 will give 45.8. Without any further testing we can be reasonably sure that this is lying in the rejection region. This statistic is lying in the rejection region because the regression sum of squares is about 50 times higher than the residual sum of squares. So we cannot really say that the 2 contributions are similar. If it is let us say in the order of 382.56 so this is around 400 and if you had got the regression sum of squares also as 500 or 600 then the 2 would have been quite comparable. But this is 50 times more. The regression sum of squares is 50 times more than the residual sum of squares. So the contribution to the total variation from the regression sum of squares is 50 times more than the residual sum of squares. So we have reasons to believe that the regression sum of squares is definitely contributing towards explaining the variation in the observed responses. And the critical value is only 4.103 and hence this F naught statistic is much higher than that and so it is lying in the critical in the rejection region rather and so we reject the null hypothesis which says that regression contribution is 0. It is first easy to find the R squared the coefficient of determination. R squared is obtained by seeing the fraction of regression sum of squares to the total sum of squares. So the regression sum of squares is 35043.74 and the total sum of squares would be the contribution of 35043.74 and the contribution of 382.56 that number we have already that is 38869.34. So the sum of squares total is 38869.34 and the regression is 35043.74. So we can take regression sum of squares with the total sum of squares and that would give us R squared that can be verified. So let me write down the R squared value. So this is what we get as R squared the coefficient of determination. Then we have to find the adjusted R squared. The adjusted R squared is obtained by using the mean square error and the mean square total. We get the mean square error by dividing the sum of squares of the residuals by n-p and the mean square total is obtained by dividing sum of squares of total with n-1. So we are penalizing the R squared for adding more parameters and you can see that when more parameters are added the degrees of freedom for the error would actually decrease and the term in the numerator will increase and that would reduce the actual R squared value. And so we are adjusting the R squared value for adding more parameters and that is why it is called as adjusted R squared and so we have 1-3825.56 that is the residual sum of squares divided by n-p13-3 that is 10 and this is the total sum of squares divided by the degrees of freedom which is not 13 because one is used up for finding the mean of the observations. So we have this ratio and we get 0.882 and these are the values which are tabulated in the, in this slide and you can see that adjusted R squared is less spectacular when compared to the coefficient of determination it is 0.882 which is lower than 0.9016. Sometimes the discrepancy between these 2 may be quite high. So now we have to do the lack of fit analysis. We do not have a prior estimate of pure error or repeated measurements to get an independent estimate of pure error from which the lack of fit sum of squares may be computed. So for computing the lack of fit we have to actually partition the residual sum of squares into lack of fit sum of squares and pure error sum of squares. So for doing the partitioning exercise we need to have an idea about the pure error. Unfortunately in this experimental sequence the repeats were not carried out. Maybe the people who are running the experiments thought they will get identical values if they repeat the experiments so they did not carry out the repeats. So an independent estimate of the pure error could not be obtained. So as I was explaining in the previous lecture we have the residual sum of squares split into lack of fit sum of squares and pure error sum of squares. We get the mean square lack of fit and mean square pure error and we compare the mean square lack of fit with mean square pure error and make the appropriate conclusions. If mean square lack of fit is much higher than mean square pure error then the model developed is inadequate. But if these two are comparable then the model is adequate and both of them are independent estimates of sigma square. So what we may do is we can artificially create repeats and let us see the trouble created by artificially creating the repeats. We will consider modeling only the intercept and x1. So we are ignoring the effect of x2. We are ignoring the effect of the amount of powder used in the machine and if we ignore that let us see what happens. Then the model we are testing is c hat is equal to beta 0, beta hat 0 plus beta hat 1 x1. This entire procedure looks very suspicious and that is going to be justified in the next set of calculations. But what it does show is suppose you had started with this model what would have been the lack of its sum of squares. Even though this procedure is based on the assumption that x2 is not significant which we do not know yet. So what we have to do is see what is going to happen if we assume a priori that x2 is not going to have an effect. And so we will only test with this model. Then it is as if we are hiding this particular column totally and it then appears that we have repeats here. For example this is 30, 40, 50, 30, 40, 50, 30, 40, 50. So it appears as if we have repeated the experiment not once but 2 times. That means we are having 3 repeated observations. So it looks very nice and if you go further you have 2 repeats 45 and 45 here, 35 and 35 here. So it looks as if we have solved the problem by ignoring the effect of the mass of powder used. But we do not know whether that is the correct procedure because the mass of powder used may have an important implication in the process. So just let us take a look at the repeated sets. We will take a look at it here. We will compare these 2 45 degrees and you are seeing on one hand it is 226.9, on the other hand it is 141.4. So when you do repeats you are getting 141 and then you are getting 226 or 227. So almost 2 times variation is there between these 2 readings. And even if you compare this 35 with this 35, 162 appears to be very far from 218. So if this was 140 and this was 160 it is okay. If this was 160 and this was 170 or 150 it is okay. That may be because of random fluctuations. But this looks to be a too large a random fluctuation, too large a fluctuation to be ascribed to random phenomena. Let us go back to the table and we will see that for conditions of 30 you have 234 and you have 194 and another 30 is 150. So for the same experimental condition of 30 degrees centigrade the colour varies from 153 to 234. That is a huge difference. So by just looking at the data itself we can see that ignoring the mass of powder was not such a good idea. What is, what it is doing is it is exaggerating the role played by experimental error. So there is also a moral in this story. If you are doing experiments and you find when you repeat the experiments that you get large variations in the repeats then there is probably another factor influencing the process response which you have not identified. So you should rather than doing more and more repeats and getting more and more variability. See what is the factor in the experiment which is actually causing this different variabilities that should be your focus. So we will put the data in the coded format. You can see that the black colours are all –1, –1, –1 they represent one set of conditions 0, 0, 0 they represent another set of conditions 1, 1, 1 which are coded in red they correspond to another set of repeated conditions, pseudo repeated conditions. And the other column is white and out and similarly you have Tc which is 0.5 and 0.5 and you also have Tc – 0.5 and – 0.5. So you have 2 repeat conditions here. So you have 5 sets of repeated data. These are 2 sets of repeated data and then in this slide you have 3 sets of repeated data. So totally we have 5 sets of repeated data. So you have the reduced model c hat is equal to beta hat 0 plus beta hat 1 x1 and we can again calculate the parameters regression sum of squares, residual sum of squares and so on. So we have the new x matrix where we have contribution only from the x1 regressor variable or we have contributions only from temperature expressed in the coded format. We have removed the column containing the coded values for the mass of detergent powder used. When we look at the column vector y it can be seen that the responses are the same as before. We have not made any modification to the responses column vector. Only thing is we have removed the second column from the x matrix. From the original x matrix we get the new x1 matrix which contains the vector of once and then the coded values for temperature. So we can carry out the exercise and that is not going to be very difficult. We can see that the x1 prime x1 matrix would be 13007, x1 prime y would be 2434.6 – 42 and then we have the inverse of this matrix which is 1 by 13 and 1 by 7 and then we get the parameters 187.27 and – 6. It is very interesting to note that even though we have removed the effect of the mass of detergent we are getting the same otherwise same x1 prime x1 matrix, x1 prime y matrix and then the inverse is also 1 by 13, 1 by 7. We had an additional 1 by 7 as the diagonal element that is no longer present and then we also have the parameters which are the same as before. We have 187.27 and – 6. Let us just go back a few slides and we will see that these values are exactly the same for the first 2 entries. So we have, so in the x1 prime x1 matrix we do not have the last 7. When you look at the x1 prime y matrix we do not have the last entry here. We have 2434.6 and – 42. Then x prime x inverse matrix we have 1 by 13, 1 by 7 and beta hat we have 187.27 and – 6. We do not have – 70.5. So the values are exactly the same as before and this shows the advantage of doing the analysis in the coded format. We have an orthogonal design which I will talk about in my next lecture. So we have the data as I had presented. So we will do further analysis of these results in the coming lecture. Thank you for your attention.