 Welcome back in today's lecture we will continue with regression analysis. So we were discussing about the extra sum of squares method and we test the null hypothesis beta 1 is equal to 0 using the statistic sum of squares of regression beta hat 1 given that beta hat 2 is already present in the model divided by r degrees of freedom for beta hat 1 divided by the mean square error. So we know that beta 2 hat and beta 1 hat are not necessarily single parameters they represent block of parameters they are actually column vectors as was discussed in the previous class. So if the computed value of the test statistic f0 is greater than f alpha r numerator degrees of freedom divided by n-p denominator degrees of freedom then the null hypothesis is rejected as at least one of the parameters in beta 1 is not 0 and at least one of the variables x1, x2 so on to xr in x1 contributes significantly to the regression model. This is called as the partial f test beta 1 is a vector of parameters that is why it is represented in bold and if we reject the null hypothesis at least one of the parameters in the newly added model is not 0 and at least one of the variables in the fresh set x1, x2 so on to xr contributes significantly to the regression model. So the new variables are considered at least one of them brings value addition to the regression. This is also known as the partial f test and so the extra sum of squares method is a very useful technique and what we can do is to use it to measure the contribution of each individual regressor variable xj as if it was the last variable added in the model. So what we do is let us arbitrarily pick up a regressor variable xj then we see the impact of adding xj to the modeling process by first developing a model without the xj parameter. So we have a regression model equation. Now we can see the impact of adding the regressor variable xj to it. So we do it by conducting a test of sum of squares of regression brought in by beta hat j given that beta hat 1, beta hat 2 so on to beta hat j-1, beta hat j-1 so on to beta hat k where already present in the model. So this is also a kind of extra sum of squares technique. So instead of adding a block of parameters we are considering only one parameter here which means that we are considering only the effect of beta hat j. So j can be any value from 1 to so on to k it need not necessarily be the first parameter or the last parameter all the time. You have to first develop a model without the parameter beta j. So without the regression parameter beta j in other words you are not accounting for the regressor variable xj. So other than that you consider all the other variables and develop the model then you see the impact of bringing in the beta j into the regression model equation. So what this means the sum of squares of regression due to beta hat j given that beta hat 1, beta hat 2 so on to beta hat j-1, beta hat j-1 so on to beta hat k are already present in the model. This means what is the increase in the regression sum of squares due to adding xj to a model that already includes x1 so on to xj-1, xj-1 so on to xk. So the sum of squares obviously will be positive it can be never negative. So when you are considering a new regressor variable xj obviously sum of squares associated with it will add on to the existing sum of squares due to the other parameters beta hat 1 so on to beta hat j-1, beta hat j-1 so on to beta hat k. So what is the value addition brought in by this particular parameter beta hat j. It is very interesting and you can see the impact of each and every parameter by doing this exercise. So the partially F test is a general procedure as the effect of a set of variables may be measured. It is used in model building where a best set of regressors are chosen for use in the model. So by doing this analysis you can identify the best set of variables which are having maximum impact on the response. So that you build a economical compact and efficient model with only the regressor variables actually influencing the process or included in the model and the other model terms or other regressor variables are excluded from the model. Now let us look at the errors. The errors you may recall was defined in the regression model. The response is equal to the true value of the response plus epsilon where epsilon is the error. The true value of the response was given as eta and the actual response was given as y. If there were no errors in the experiment then miraculously all the responses would be equal to the true value and when we repeat the experiments n number of times we will get the same value eta i. So we get different values because of the errors and we also noted that the errors are normally distributed with mean 0 and variance sigma squared. The errors were normally distributed around 0 with the 0 as the average and a constant variance of sigma squared. The observations yi as shown previously are also normally distributed and independent distributed with mean beta 0 plus i is equal to 1 to k sigma beta i xij and variance sigma squared. So we talked about the errors. Now what about the response? The response is nothing but a particular value a constant value eta i plus the error. So when your errors are normally distributed when you add a constant to it then the response also will be normally distributed okay. And the true value which is adding to the average of 0 would be nothing but the correct exact model beta 0 plus i equals 1 to k sigma beta i xij. So I am not sure how many of you could follow this verbal statements. So I will just show a diagram in the next slide which we have also seen previously. So for easier representation I am just considering only one regressor variable x1 and or here it can be generally called as x and this is the response of y versus x. You can have several points but I am just showing two points for illustration okay. Looking at this representation where we have only one regressor variables for convenience and that regressor variable is x. We are plotting the response y versus x and this solid line here is the true model given by y is equal to beta 0 plus beta 1 x. So this is not equal to beta 0 hat plus beta 1 hat. It is actually the true model. That is why it is called as a true line given by beta 0 plus beta 1 x. So please note the distinction between beta 0 and beta 1 which are actually the exact or the true parameters representing the process whereas beta hat 0 and beta hat 1 would be the predicted parameters for beta 0 and beta So having got that out of the way we see that we are having these data points scattered around this true line. If the experiments were perfect and uninfluenced by errors the two dots would have fallen on the solid line but they are sort of scattered. So these are the responses and these responses are normally distributed and the mean value of the response. For example this is response 1 and this is response 2. There can be several such responses. I am just showing two for illustration and the line is drawn in such a way so that the responses are on either side of the line. So this response is above the line and this response is below the line. So you can see that the responses because of the error are normally distributed with the mean value given by the true line beta 0 plus beta 1 x and the variance of this distribution is sigma squared. So what it means is because of random effects the points here may fall anywhere in this region. Of course it may even go beyond that but the probability of that occurrence would be very less. This is the normal distribution and it depends on the value of sigma squared. If the sigma squared is pretty high then there is a possibility that the point may be even further off because the distribution would be more broad. And on the same line if the sigma squared is very small then this distribution would be narrow and the points would be lying closer to the line. So this is the value x1 let us say the first setting of the regressor variable x or xA for example and this is the response. So the true value would be y at point A is equal to beta 0 plus beta 1 xA. Similarly this is say xB for the regressor variable x. So then the yB the response at B would be beta 0 plus beta 1 xB that would be the mean value or the true value but the actual value may be somewhere away from the mean value. So now we have discussed about extra sum of squares and the t tests and so on. So now let us look at the confidence intervals on the regression coefficients. So the vector beta hat may be shown to be normally distributed with mean vector beta and covariance matrix x prime x inverse sigma squared. Now we are not dealing with the individual entities but we are actually dealing with the collection of regression coefficients and they are given as a column vector. And so that would have a mean vector beta and a covariance matrix x prime x inverse sigma squared. We have already seen what is a covariance matrix in one of our earlier lectures. So now we have a t statistic which may be defined as t is equal to beta hat j minus beta j. This is the actual regress value of the parameter and this is the true value. And we also have the sigma hat squared because sigma squared is not known. So we need to have an estimate of the sigma squared for which we use if you recollect the residual sum of squares divided by n minus p where n is the number of data points and p is the number of parameters. And cjj is the x prime x inverse matrices diagonal coefficient corresponding to j. The x prime x inverse matrix is matrix which may comprise of off diagonal term 0 or non-zero. But we are not interested in the off diagonal terms. We are only interested in the diagonal term and we pick up the diagonal corresponding to j. For example, if we are looking at beta 1 then we will be looking at c11 first row first column element. If you are looking at beta 2 then j will be equal to 2 and so we will be looking at c22 second row second element. So we will be looking at the value of the variance covariance matrix along the diagonal. And since the sigma hat squared was based on n minus p degrees of freedom, the n minus p degrees of freedom were associated with the residual sum of squares. The t distribution is also associated with the n minus p degrees of freedom. n is the number of data points and p is the number of parameters. So now we can define the 100 into 1-alpha person confidence interval for the regression coefficient beta j, j running from 0, 1, 2 so on to k in the multiple linear regression model. So we have this beta hat j-t alpha by 2n-p standard error of beta hat j less than or equal to beta j plus t alpha by 2n-p standard error of beta hat j. I think this should look very familiar to you. In our phase 1 of the lectures where we discussed about t distributions, the hypothesis testing, the confidence intervals, we had if you recollect x bar minus t alpha by 2 s by root n where s is the standard deviation of the sample, s by root n less than or equal to mu less than or equal to x bar plus t alpha by 2n-p into s by root n. In some cases we had sigma by root n, in some cases we had s by root n. So it depends on whether we use the t distribution or the z distribution. If the population variance was known then and the population was normal distribution or the sample size was pretty high greater than 30 so that we can bring in the central limit theorem into play, then we can use z alpha by 2 into sigma by root n. In the cases where the parent distribution is normally distributed and the variance sigma squared is not known which is usually the case then we have to make do with the sample standard deviation and so we have s, the sample standard deviation and so we put x bar minus t alpha by 2 into s by root n. So whatever we have studied earlier is making perfect sense now. Now we are developing the confidence intervals for the regression coefficient beta j and that is why we have the predicted or the sampled if you want to put it that way value of beta j. So that would be beta hat j and then you have the t distribution value corresponding to the chosen level of significance of alpha and n minus p degrees of freedom. You can always read up this value from the t tables or go to a spreadsheet and then calculate the t value and then you also have the standard error of beta hat j. So this will give you the confidence interval and what do you use with this confidence interval? If the confidence interval is such that you have a negative lower limit and the positive upper limit then the beta j is pretty much worthless. On the other hand if the lower limit of beta j is let us say very close to the upper limit of beta j, the lower limit would be on the left hand side and the upper limit would be on the right hand side. So if the upper limit and the lower limit are pretty close to each other, they can be negative or positive but if they are very close to each other then that parameter beta j has been precisely identified. But if you have a case where the left hand side is negative and the right hand side is positive then what do you make out of that beta j? Is it acting towards increasing the response when the regressor variable xj increases or is it acting towards decreasing the response when the regressor variable xj increases? So under such a scenario we cannot make any definitive conclusion about the beta j and we pretty much say that it is insignificant. So the moral of the story is the lower limit and the upper limit of beta j should bear the same sign. If they have a negative value then that beta j is acting towards decreasing the model response when xj increases. If the beta j has both positive lower limit and positive upper limit then the beta j is taking a positive value and when xj increases the effect of xj is to increase the process response. Okay, now whenever you do regression analysis either manually which is very rare or you do with the help of a software or a spreadsheet the program throws a lot of results and sometimes we do not really know what those results mean. The most popular of that would be R squared and if the R squared values let us say 0.99 we feel very happy and we feel the achievement of fitting an excellent model to the given data. Similarly let us see there are a few pitfalls in this kind of feeling when you have a very high value of R squared. Let us see what those are. So now the coefficient of determination R squared so now you have a name for it instead of just a symbol coefficient of determination I do not know how many of you were aware of it previously. The coefficient of determination is simply the ratio of the regression sum of squares the total sum of squares. The regression sum of squares is a very valuable entity. It sort of gives you the effective worth of the regression model. We also looked at the extra sum of squares and the partially of test and so on. So we were always talking about the sum of squares of the regression brought in by a particular parameter or a set of parameters. So collectively they represent the total sum of squares of the regression and we compare the sum of squares of the regression with the total sum of squares in the model and see what fraction of the total sum of squares is contributed by the sum of squares of regression. If miraculously you have a situation where the regression entirely contributes to the sum of squares, total sum of squares then R squared value will be equal to 1. So you would like by looking at this equation the R squared value to be as close as 1 to be good enough. But I have seen papers especially in the biological sciences where people report values of R squared of 0.68, 0.7 and so on. So it all depends upon the application what would be an acceptable value. So what exactly is R squared? R squared other than being the sum of squares of regression by total sum of squares which does not really make sense to somebody who is not familiar with the subject. R squared represents the proportion of the total variability accounted or explained by the linear regression model. So you have certain amount of variability in your process and what fraction or portion of the variability may be explained by your developed regression model. If the variability is predominantly explained by your regression model then the R squared value would be quite close to 1 and you may have the satisfaction of developing a reasonably good model okay. So but there is a word of caution that may be added when we use R squared. For example if you have 5 data points and you fit a model with 5 parameters R squared will be equal to 1. All the variability would have been explained by the regression model. Here you are not doing regression or linear regression curve fitting. In fact you are trying to fit 5 unknowns and you are having 5 equations and so essentially solving for 5 equations in 5 unknowns and obviously the 5 unknowns you are estimating should actually satisfy all the 5 equations. So all the variability is accounted for and your R squared value will be equal to 1. That is not acceptable. We normally work with the large data set let us say 40 or 50 data points and we try to fit only a few parameters 3 or 4 parameters unless the model is exact miraculously and the data have been generated with exactly no error which is very unlikely. You will not have a situation where the 5 parameters in the regression model will be able to account for the responses of the 40 experimental sets okay. So there will always be some discrepancy. So how do you increase the value of R squared? The R squared value may be increased by increasing the number of terms in the model and thereby increasing the number of coefficients that may be adjusted so that the model can be made to fit the data excellently. In the extreme case if you fit a model for 40 experimental points with 40 parameters then you will get an exact fit but imagine having a model with 40 parameters it will run to half a page or full page and that model will look really ugly. So we have to see what would be the best set of parameters which will give you reasonably high value of R squared. So what is this reasonable high value of R squared? How do you quantify it? So again to sort of summarize what I have said so far a complex model running to half a page or a single page becomes cumbersome to handle more empirical in nature and difficult to explain physically as to why this model is able to fit the data, what is the physical meaning of the model and so on okay. For example if you have temperature to the power of 3 or temperature to the power of 4 what is the physical reason that the response is affected by the fourth power of temperature? Is it radiation? Okay if it is not a radiation if it is a simple reaction problem why do you have power of t to the power of 4 as a simple illustration okay. And also if your data is very noisy it is subject to lot of error then when you fit a model with many parameters to it it may not be really successful when you slightly change the value of the regressor variables. For example a model was developed with a certain set of values of xj and why is a model developed in the first place? So that you do not have to keep on doing experiments time after time once you have a developed model you can use it to represent the process in future design calculations or simulations and so on. So you do not have to resort to doing experimentation every time but when you are having a noisy data and you have fitted a model with too many parameters then when you change the value of x slightly or even use the same values of x you will find to your surprise that the model which was developed with so many parameters and worked well with those set of data may not be doing a good job with the new set of data okay. So this is a problem you may encounter often okay because your experiments are very variable and every time you cannot be fitting a new regression model to explain that particular set of experimental data. You would have a experimental data set collection that is the reason why you should do the experiments as carefully as possible trying to minimize the errors and unwanted errors or unavoidable errors you have to live with but you should deliberately not introduce any systematic error in your experiments. So you have to collect the data properly and fit a satisfactory regression model okay. You should not try to aim for a regression coefficient of 1 all the time okay. So now that brings us to the concept of adjusted R squared and what we do is the concept is pretty much the same it is a regression sum of squares by total sum of squares and the regression sum of squares may be written as total sum of squares minus error sum of squares divided by total sum of squares. So that is why you will get 1 minus error sum of squares by total sum of squares. So just go back to that equation this can be written as total sum of squares minus error sum of squares and when you divide by sum of squares of total you will get 1 minus sum of squares of error by sum of squares of total. I think you can figure it out this is total sum of squares minus error sum of squares divided by total sum of squares you just make the division and you will find it is equal to 1 minus error sum of squares with total sum of squares and that is similar to what we have written here. But here we have scaled the sum of squares of error by sum of squares total with the associated degrees of freedom. The degrees of freedom for sum of squares of error is n minus p and the degrees of freedom for sum of squares of total is n minus 1. So here we scale sum of squares of error by n minus p sum of squares total by n minus 1. So rather than using sum of squares we are using mean square okay. There is a strong justification for scaling the sum of squares by the degrees of freedom what is it? So if you want this r squared adjusted to be as close as one to be possible then this term the numerator term should be as small as possible. How will the numerator term be as small as possible when either the sum of squares of error is very very small or this n minus p is quite high. But when you keep on adding more and more parameters this n minus p term will become smaller and since this becomes smaller the numerator term will start to increase and so the adjusted r squared will start to decrease because when the numerator starts to increase the r squared adjusted will start to decrease. So that is penalty for adding more and more parameters. Suppose you add a parameter which is having a strong influence on the process the sum of square of error will drastically reduce. So even though the n minus p has decreased by 1 the sum of squares of error has decreased even more considerably and so this effect of the overall effect would be to reduce the term on the other side of the negative sign. So this term will decrease so the r squared adjusted would be quite high. But on the other hand if the sum of squares of error decreases only by a small amount and you are adding many parameters to this n minus p will start to decrease very quickly and this will start to increase the numerator term and when the numerator term increases r squared adjusted will decrease. So that is why we should not be in a hurry to keep on adding more and more regressor variables to our model just to get the r squared to be 1. So it is a good practice to look at the r squared adjusted also and see whether it is satisfactory. Sometimes I have seen cases where the regression r squared value is 0.97 or 0.98 whereas the r squared adjusted would be only 0.84 or 0.85. Suppose when you add the additional parameter and the regression coefficient goes to 0.975 or so but the r squared adjusted reduces to 0.83. Then really the effect of the additional parameter is pretty much worthless. It is useless. So please look at the variation of r squared adjusted when you add more parameters rather than only looking at r squared. So as I said earlier unless the sum of squares of error is considerably reduced by adding the extra term to the model equation the r squared adjusted will increase upon increase in the number of parameters. So there is a typo here I will just correct it. The adjusted r squared will decrease. So unless the sum of squares is considerably reduced by adding the extra term to the model equation the adjusted r squared will decrease upon increasing the number of parameters. Now we come to another term called as the prediction error sum of squares and this is called as press. Some of the computer outputs also report this value and this term is also similar to the sum of squares of the residuals. What we do is we sum the square of the deviations between the actual responses and the corresponding model predicted values. What is the difference here? You are having the actual response and the corresponding model prediction the difference between the 2 is squared. So this is also looking like some residual sum of squares. So what is new here in press? We will see. So the main difference is that the prediction for the ith data is based on a model equation that excluded that particular data point but use the remaining data points to develop the model equation. So when you are considering the error sum of squares or the residual sum of squares for a particular ith data, obviously you are going to subtract the response with the model predicted value and square it. So what is the difference here? The main difference is when you are looking at the residual sum of squares for the ith data point the prediction is based on a model that actually excluded that ith data point. For example, if I am calculating the residual sum of squares for the first data point I would develop a model with the remaining data points and I would have a model which did not use the first data point. Then I will use the model to predict the response for the first experimental data point. Then the difference between the experimental value and the prediction value is squared to give the residual sum of squares or the error sum of squares for the first data point. Now when I am going to the next second data point I will first develop a model without the second data point. So I will have a model equation. Then I will subtract the experimental response for the second data point with the model prediction based on the remaining data points except the second all other data points then that model prediction is subtracted from the second experimental data point response and that is squared. Similarly I do it for all the remaining data points in the set. This may look to be a bit tedious but there are ways in which this can be done much faster but that is beyond the scope of this course. So we want the press value to be also quite small okay. So that is why just let me sort of summarize. The main difference to watch out for is that the prediction for the i data is based on a model equation, regression model equation that excluded that particular data point but use the remaining data points to develop the model equation. So the same treatment is meted out to other data points as well when they are compared to the corresponding individual model predictions. So earlier we were looking at prediction error sum of squares or the press. Now we will be looking at sequential sum of squares. As the name implies it is the gradual model development focusing on first the main effects. Then the second order effects or in other words the product of factors taken two at a time. So once we are done with the main effects then we consider the effect of adding factors two at a time to a model already containing the main effects and once we have done that. So we have now a model with the main effects and then the second order interactions or product of two factors okay. Suppose you have a model A with the main effects A, B and C. First you develop a model with only main effects A, B and C. Then you look at the second order effects A, B, B, C, A, C and after having developed this model then consider the effect of third order interactions which would be A, B, C. So you are developing the model as main effects, second order interactions and then the third order interaction. So what we do is represents the contribution to the total sum of squares from the main effects. Then the additional contribution from second order interactions to the model already containing the main effects and next the sum of squares brought in by the third order interactions to the model already containing the remaining terms. So we can gradually see that there would be less and less impact to total sum of squares by higher order terms okay. In some cases the interactions may be contributing to the total sum of squares more than the main effects but usually beyond second order interactions may be third order interactions, the higher and higher order interactions would be contributing negligibly to the total sum of squares and their value would really not be there. So it is another way of telling that do not develop a model beyond the third order interaction. So repeating what I said when you add the sum of squares due to two way interactions the sum of squares contribution from main factors is already present. When the third order interaction A, B, C sum of squares is added then the remaining effects sum of squares have been accounted for and the remaining effects meaning the main effects and the second order interactions. Now we come to another term called as the adjusted sum of squares. This represents the increase in sum of squares when the term is added to the model which is already containing all the other terms. The adjusted sum of squares is different from sequential sum of squares okay. The sequential sum of squares as the name implies we are doing it sequentially systematically in an organized fashion. So what we do is we develop a model without a main effect A let us say okay. So we develop a model with B, C then we even do AB, BC, AC then we also do ABC then we finally add the factor or regressor variable A at the very end and see the regression sum of squares brought in by it okay. So this is the increase in sum of squares when the term is added to the model which is already containing all the other terms okay. In an orthogonal design containing equal number of repeats per cell the sequential sum of squares and adjusted sum of squares are identical. This is another beauty of the orthogonal designs. The statistically designed experiments the factorial design of experiments are usually orthogonal and so you have the advantage of the adjusted sum of squares being equal to the sequential sum of squares. So really it does not really matter whether the factor is added in the beginning or in the end it is contributing to the sum of squares in an identical fashion. But in non-orthogonal designs you can even note that the sequential sum of squares and adjusted sum of squares are not the same okay. Now we look at the term bias in the model. So what we do is we fit a model to the experimental data Yi and obtain the model predictions. The residual we know by now is Yi-Y hat and we hope that the residual which is defined above is only caused by random error. If so the residual sum of squares helps us to find the error variance okay. This is a very important concept okay. Whatever we are unable to explain by the model we really hope that it is because of random effects only. But if you think a bit deeper the difference between the experiment value and the model prediction may not always be due to experimental error alone. Maybe you have not developed a sufficiently acceptable model okay. Maybe the person who was doing the model development was very lazy and when there are two factors or two factors in interactions affecting the model or influencing the model he might have taken the easy way out and developed a regression equation with only one regressor variable even though two regressor variables and other interactions are influencing the process physically okay. In such a situation you cannot argue that the discrepancy between the experimental value and the model prediction is only because of random error. It can be also because of an inadequate model. This is what we are going to discuss from these slides on okay. So if the model is however inadequate that is very important. Let me just go back. If the model is however inadequate then the difference above is bloated or increased by not only experimental error but also due to model error. This is very important. So you are having experimental error, random error and then you also have the model error. So how do you split or delineate the two errors okay. The residual sum of squares is containing both the modeling error and also the random error. How do you want to split them? So and for that purpose we define a bias and call it as the expected value of the experimental response and the expected value of the model prediction okay for the ith experimental condition. Expected value of the experimental response is matching with the expected value of the model prediction then the bias will be equal to 0. On the other hand if the expected value of the experiment response is different from the expected value of the model prediction then you have a non-zero bias and so when you have mean square residual let us say p is equal to 2 for convenience okay you are having only 2 parameters then the mean square residual is given by i is equal to 1 to n yi-y predicted i y hat i whole square divided by n-p or n-2. This is the mean square residual. If this sum of squares arises from an adequate model then the residual square arises from random variations only and hence it is an estimate of the error variance sigma squared okay. We do not know the error variance sigma squared. So we are hoping that the residual sum of squares will give us an idea or an estimate about the error variance. But if the residual sum of squares is also having the variation due to an inadequate model then we cannot use the residual sum of squares to get a good idea or a good estimate on the experimental error okay. The mean sum of squares will be higher than the experimental error contribution okay. So we have to be careful. However if the model is inadequate then the above sum of squares the residual sum of squares also has in addition the contribution from systematic components i e due to bias. So we have sigma squared which is the error variance and plus sigma bi squared by n-2 okay. So the residual sum of squares has contribution from sigma squared and the bias contribution okay. Now how to find out whether we are having an adequate model or an inadequate model okay. So what we do here is let us say that we know sigma squared from prior knowledge okay or from experience or previous data sets and so on. So you have a fair idea about sigma squared. So what we do is compare the residual mean square that is sigma yi-y hat i whole squared by n-2 or n-p and compare to the prior variance using an F test to see if the residual mean square is significantly larger than sigma squared okay. So you compare the residual sum of squares divided by the degrees of freedom with sigma squared and then see whether the residual sum of squares divided by n-p is comparable to sigma squared okay. And if for this case if it is statistically significant the residual mean square cannot be statistically equal to sigma squared and then the model is said to have a lack of fit okay. So we should reconsider the model as it is inadequate in the present form. On the other hand if you do not have information on sigma squared which is usually the case but repeat measurements on yi are available. This is another reason why you should perform repeats in your experiment okay. So when you have repeat measurements this is reflection on the pure error because when you repeat experiments you are not going to get the identical response. You will be having different values of the response for repeated experiments. So this you can use to obtain an idea about the random fluctuations or the random variations the sum of squares caused by true random variations. So we can even call it as sum of squares due to pure error because the repeats represent pure error and we are hoping that when you repeat the experiments you are making sure that all the variables are kept at their assigned values in all the runs okay. Even if one value of the variable changes slightly then it cannot be called as a genuine repeat okay. So what you do is repeat measurements on yi are available and this is a reflection of pure error or unadulterated error. As for a given excite two or more repeat estimates of yi are taken then the observed differences in the measured values of yi may be attributed only to random effects. So continuing with the case 2 where we do not know sigma squared and this is usually more often the case. It is very essential to have repeat experiments in our program or plan okay. And this is brought out very nicely by Draper and Smith in their 1998 book okay. Now we are now looking at pure error sum of squares. So we call the sum of squares pure error and what we do is we have repeated experiments across different experimental settings okay. First experimental setting combination we do repeats. Then the next experimental setting combination we do repeats. So we will assume that the errors in the first repeat is for the first experimental setting. The repeats are analogous to the repeats in the second experimental setting in the sense that the errors which influence the second settings are random and they also are identical with the errors that represent the first experimental setting. That means all the errors which are influencing the first experimental setting and the errors influencing the second experimental setting come from the same family. You know that the errors are distributed normally with 0 mean and variance sigma squared. So the error variance in the first experimental setting is identical with the error variance in the second experimental setting okay. So that is what I mean. So with this assumption we are going to pull the sum of squares of the pure error so that we get a better overall estimate okay. So we have sum of squares of pure error is given by i is equal to 1 to n j is equal to 1 to m yij-y bar i whole squared okay. So you are having m repeats okay. That means the repeats may be different for every experimental setting but let us assume that normally we conduct same number of repeats for every experimental setting. Let us call that value as m. So j represents the repeat and i represents the experimental setting. So we have sum of squares of pure error is yij-y bar i whole squared and then we do it for all the experimental settings. The degrees of freedom by now you should be able to show that it is n times m-1 every set of repeated experiments would have a degrees of freedom of m-1 assuming m to be same for all experimental settings. So nm to m-1 will become n-n, n in the m is a total number of runs and n is the number of independent experimental settings. Now the lack of fit sum of squares a very important quantity can be defined as follows. Sum of squares of residual is equal to sum of squares of lack of fit plus sum of squares of pure error. The residual sum of squares is now split into lack of fit sum of squares and pure error sum of squares. Just now you have found the pure error sum of squares okay. The pure error sum of squares was found by using this relationship. Now you can get the sum of squares of lack of fit by simply subtracting the residual sum of squares with the sum of squares of pure error. So that will give you the sum of squares for lack of fit. Now you can take the lack of fit mean squares with the pure error mean squares through an F test at the 100 into 1-alpha percent confidence level okay. So you can easily find the degrees of freedom for the lack of fit sum of squares. You know that the sum of squares of pure error have a degrees of freedom of n-n. Please note capital N-small n okay. The residual sum of squares have the degrees of freedom of n-p where n is the total number of runs and p is the number of parameters okay. So you have n-p and you have capital N-p and you have capital N-small n okay. The difference between the degrees of freedom for the sum of squares of residual and the sum of squares of pure error will give you the degrees of freedom for the lack of fit. I request you to work it out yourself. So the lack of fit mean squares may be compared with the pure error mean squares through an F test at the 100 into 1-alpha percent confidence level. The numerator and denominator degrees of freedom are n-2 or generally n-p and n-n degrees of freedom respectively okay. So the lack of fit sum of squares will have a degrees of freedom of n-p where small n is the number of independent settings and p is the number of parameters and the pure error sum of squares will have the degrees of freedom of capital N-small n. Capital N is the total number of runs and small n is the number of independent settings. If the F statistic falls in the rejection region then the lack of fit sum of squares is significantly different from the error sum of squares and the model chosen has to be reevaluated. If the F statistic lies in the acceptance region both the lack of fit and pure error sum of squares may be used as independent estimates of sigma squared. In fact the mean residual sum of squares itself may be then used as a pooled estimate of sigma squared okay. If the sigma squared is based upon more data points then that is better. In fact if the degrees of freedom associated with the error estimation is higher then that error estimation is more valuable. So the sigma squared based on the residual sum of squares if the model is adequate is recommended as a surrogate for finding sigma squared okay. What I am trying to say is the mean square residual can be used as an estimate of sigma squared because it is having higher degrees of freedom. This is only to be done when the model which is fitted is adequate okay. If the model is not adequate then you cannot use the residual mean square as an estimate for sigma squared okay. This may be nicely represented by a flow diagram as again given by Draper and Smith based on Draper and Smith okay. So the residual sum of squares is split into nr residual degrees of freedom which splits into lack of fit sum of squares and pure error sum of squares. Let us call the degrees of freedom associated with pure error sum of squares as ne and lack of fit sum of squares is nr-ne. And when you divide the sum of squares with the respective degrees of freedom you get the mean square lack of fit and mean square pure error okay. And then you compare the mean square lack of fit and the mean square pure error okay. The mean square lack of fit is an estimate of sigma squared if the model is correct and sigma squared plus bias term if the model is insufficient. And the mean square pure error is obviously going to be a reliable and true estimate of sigma squared right. This completes our discussion on the various aspects of linear regression. The various terms you may often encounter in a regression output. Hopefully after this lecture you will be able to appreciate the value of the different terms in the regression output rather than basing your judgments solely on the r squared value. In fact when you are explaining your results to your thesis supervisor or to your boss in the company or to your R and D manager it will make a good impression if you are able to provide more insight into the developed regression model equation. Having said that please note that this regression model equations have not been really based on first principles and it is only an empirical equation. But still if it can represent the effect of various variables and interaction between the variables in a reliable manner the developed regression model is very useful. Because many times in real life we cannot always model the processes from first principles okay. So this completes our discussion and thanks for your attention.