 Hello, welcome back. In today's class, we will be looking at hypothesis testing in linear regression. So, what is the motivation for doing this test? When we develop a linear regression model, we want to see which of the variables we had taken or considered in the experiment are really important. At the beginning, when we are not having prior knowledge or experience, we really do not know which variables are important, which variables are not important. So, we would like to include as many variables as possible in our experimental program and we perform the experiment and we get the data. Now, we want to analyze the data and identify which of the variables are really significant and influence the experiment strongly. So, how do we go about it? That is what we are going to see in today's lecture. So, as a slide indicates, the test is meant to check whether there is a linear relationship between the response y and the subset of the regressor variables x1, x2 so on to xk. So, these regressor variables are actually the variables we are investigating in the experiment. So, we carry out the hypothesis testing in multiple linear regression. Our null hypothesis is beta 1 equals beta 2 so on to beta k is equal to 0 and make a small correction here. So, we have beta 1 equals beta 2 so on to beta k is equal to 0 that is a null hypothesis. So, what it really means is that none of the regression coefficients are having numbers that are significantly different from 0. Look, the beta 1, beta 2 so on to beta k may take either negative values or positive values. If they take positive values, it means that they are positively affecting the response. For example, the yield of a chemical reaction may increase with increasing temperature. On the other hand, if you have a negative value for beta j, then it means that when the variable increases, it actually has a negative effect on the response. For example, when pressure increases, the volume may decrease. So, it depends upon the experiment we are looking at. When the beta j that is one of the regression coefficients becomes 0, then beta j x j will be 0. This means that whatever may be the value taken by x j, the effect of that particular variable on the experiment is insignificant. So, this is what we are trying to test. The null hypothesis says that all the regression coefficients are 0. That means none of the variables are really affecting the process. This is the most skeptical point of view a person may take at the beginning of the experiment. But as experimenters, we should be really skeptical and not have some preconceived notions. See, the alternate hypothesis says that beta j is not equal to 0 for at least 1 j. This means that among all the regression coefficients, at least one of them is non-zero. In other words, there is at least one variable in the experiment which is actually affecting the process response. So, when we accept the null hypothesis, we agree that none of the regression coefficients are taking a value other than 0. So, we say that H0, the null hypothesis is beta 1 equals beta 2 so on to beta k is equal to 0. If we agree with this, then none of the variables are really affecting the response. The alternate hypothesis is for at least 1 j beta 1 or beta 2 or so on to beta k, at least one of them is non-zero. It may be negative or it may be positive. So, the rejection of H0 implies that at least one of the regressor variables x1, x2 so on to xk contributes significantly to the linear regression model. So, here we are having k variables, k independent variables x1, x2 so on to xk and these are the regression coefficients which are attached to these regressor variables. And when let us say that beta j takes a value 0, beta j xj will be equal to 0 and there would not be any effect of that particular regressor variable xj on the process response. So, how to carry out this hypothesis testing? So, we have the experimental data with us and we first find the total sum of squares and then we split it into regression sum of squares and residual sum of squares. So, I have indicated this briefly here, sum of squares total equals sum of squares regression plus sum of squares residual. So, whenever we compute the squares, we also have to find the degrees of freedom. Whenever we want to compute the variance not only we find the deviation from the mean but we also divided by n-1 where n is the total number of observations. So, the sum of squares is actually divided by a certain value which is related to the data size. In our present analysis also whenever we are considering linear regression, we have the total sum of squares and we have to scale it by the appropriate or associated degrees of freedom. So, the total sum of squares has n-1 degrees of freedom where n is the total number of observations. The sum of squares of regression has k degrees of freedom where k is related to the total number of regression coefficients in the following manner. So, we have p equals k plus 1. I think we have already come across this earlier to re-itrate p is the total number of parameters and that includes the parameter beta not the so called intercept of the regression model. The residual sum of squares will then have n-p degrees of freedom where n is the total number of responses. So, now we can go to the analysis of variance table. We have in the ANOVA table the usual entities, the source of variation, the sum of squares associated with the source of variation, the degrees of freedom and we divide the sum of squares with the associated degrees of freedom to get the mean square. So, k is equal to p-1 that means the total number of parameters minus 1. Here we are not considering the intercept, we are only considering the regression coefficients beta 1, beta 2 so on to beta k. And when we divide the sum of squares of regression with the k degrees of freedom, we get the mean square regression. And then we also have the residual sum of squares. This is a very important aspect in regression analysis because only by looking at the residuals and the pattern of the residuals, we can really judge about the quality of the fit. So, we have the residual sum of squares as sum of squares of E. Again instead of rather writing residuals, I have used the subscript E. Residuals may also be associated with the error because it is the difference between the experimental value and the model prediction. So the residual is defined as the difference between the experimental value and the model prediction and so we have the error with respect to the model prediction and the sum of squares associated with the residuals is given by SSE. And the degrees of freedom associated with that is n-p. So the mean square would be sum of squares of the residuals divided by n-p that will give you mean square residuals. So we take the ratio of mean square regression to the mean square residual to get the f0 value. So we also have the total sum of squares SST which is having n-1 degrees of freedom. So it will look like n-p plus k and k is nothing but p-1. So n-p plus p-1 will give us n-1 degrees of freedom. So when you add up these two, you get the total degrees of freedom as n-1. So that is what this slide also tells. Repeating n is the total number of observations and p is the total number of parameters including the intercept parameter beta0 and so we have k is equal to n-1-n-p and we get k is equal to p-1. What I am doing is we saw that k and n-p add up to give n-1. This is telling the same thing in a different way. We just subtract n-p from n-1 and we get p-1, k is equal to p-1. So we can take whatever root we want. The regression mean square scaled by error variance sigma squared follows a chi square distribution with k degrees of freedom. We have to make proper judgment regarding the observed mean squares. So we have mean square regression, we have mean square error or mean square residuals. So the ratio of the two we consider and we have to test it against a suitable distribution. What is that suitable distribution? We also know that the mean square regression and the mean square residual are independent and we divide both of them by sigma squared and we then have the mean square regression divided by sigma squared to form the chi square distribution with k degrees of freedom. So when we divide msr by sigma squared it leads to a chi square distribution with k degrees of freedom and when you divide msr by sigma squared you have to divide msc also by sigma squared and you get another chi square distribution with n-p degrees of freedom and the ratio of the two chi square distributions is the f distribution. So the regression and residual mean squares are independent and the ratio follows an f distribution with k numerator and n-p denominator degrees of freedom. So you can see that we have f0 here and that is sum of squares of regression divided by k and this is the sum of squares of the residuals or the sum of square of the error divided by n-p. The k degrees of freedom are in the numerator and n-p degrees of freedom are in the denominator. What I am trying to say here is the k degrees of freedom are associated with the sum of squares in the numerator and the n-p degrees of freedom are associated with the sum of squares of the residuals which is in the denominator. So we have f0 is equal to mean square regression by mean square error. The sigma squared actually cancels out. So we can simply take f0 as msr by mse and we do the usual f test. By now you should be familiar with the implementation of the f test. We have also done some practice problems or example sets earlier. So I request you to go through those problems and refresh your memory. So we also know that we reject the null hypothesis H0 if the test statistic computed above is greater than f alpha kn-p. Alpha is the significance level usually taken as 0.05. So if it lies in the critical region or in the rejection region then we reject the null hypothesis. So continuing with our discussion on the resolution of the total sum of squares, the total sum of squares is given by sigma i is equal to 1 to n yi-y bar whole squared where yi is the actual ith experimental data recorded by the experimenter and y bar is the average of all the n experimental observations. So you have this relationship given here. This may be expanded and simplified. The derivation is fairly straightforward and you get sigma i equals 1 to n yi squared-sigma i equals 1 to n yi whole squared divided by n. This indicates the sum of the square of all the responses and this is the sum of the observations is squared. So please do not confuse this with this term. Here the individual observation is squared. Similarly all the other observations are also squared. Then the sum is taken. Here first the sum is taken and then it is squared. So this may be represented by y prime y. Sigma i equals 1 to n yi squared is nothing but y prime y. So you have the column vector of the responses and a transpose is taken for the column vector and then it is multiplied with the actual response column vector and when you do that you will get the sum of the square of all the observations. And then you also have the sum of the observations squared divided by n. Here n is the total number of responses. So this is the total sum of squares and as you can see we are gradually moving on to the representation of the various sum of squares using matrix notation. The matrix method is quite convenient and it helps us to do the calculations which are otherwise tedious in a very efficient manner. So we have the sum of squares of the residuals as y prime y minus beta hat prime x prime y. So the sum of squares of the residuals may be written as y prime y minus sigma equals 1 to n yi whole squared by n minus beta hat prime x prime y minus sigma equals 1 to n yi whole squared divided by n. What I am doing here is I am subtracting and adding this term sigma equals 1 to n yi whole squared by n and that leads to by definition the sum of squares of the total here and we also have this term as the sum of squares of regression. We started off by saying that sum of squares total is equal to sum of squares regression plus sum of squares residual. So we have the expression for the sum of squares of the residuals and we also have the expression for the sum of squares total and so when we subtract the sum of squares of the residuals from the total sum of squares we get the regression sum of squares. So this blue term here represents the sum of squares of regression. So when you have the regression linear regression parameters estimated and you have the x matrix and you have the y column vector you can get the regression sum of squares by considering beta hat prime x prime y minus sigma equals 1 to n yi whole squared by n. Beta hat is nothing but the vector of the estimated regression parameters including the intercept beta hat naught and then x is the matrix x matrix so x prime would be the transpose of the x matrix. We have already seen how to set up the x matrix in one of the earlier lectures and then y is the vector of observations. So we have sum of squares of residual sum of squares of total minus sum of squares of regression and we have the sum of squares of error given by this relation and then the sum of squares of regression is given by this relation. So now coming back to the hypothesis tests on individual regression coefficients okay. So we have to see whether a particular regression coefficient beta j is actually taking up a particular value beta j naught or it is not taking that particular value. So now we are concentrating on the individual regression coefficients and whether they take up a value or not okay. So you can put 0 here and see that that pretty much the regression coefficient is insignificant and does not affect the model or it does not affect the response in fact. And then the alternate hypothesis is the value is not equal to 0 but it may be less than 0 or greater than 0. So to be more general instead of fixing the value to be 0 all the time instead of fixing beta j naught to be 0 all the time we can fix it to some other value 100 for example. So it need not be always 0. You can also hypothesize on a particular value taken by the regression parameter. Instead of looking at the whole bunch of regression parameters now we are concentrating on a single regression parameter. It may be a good idea for you to not proceed with the lecture as of now just pause a bit and then think yourself how you will carry out the test okay for this particular case. We have already come across this earlier and I would like you to think about it and then write down on a notebook you must be carrying with you as to how you would proceed. So I hope you have at least made an attempt and let us see how to do it. So we have the tests on individual regression coefficients. So the H naught is beta j is equal to beta j naught and H 1 is beta j is not equal to beta j naught. And then as you can see here we carry out a t test okay. So you must recollect the t test now if you are unable to remember I request you to just go back and refresh your memory. So t naught is equal to beta j hat minus beta j naught by square root of sigma squared cjj that is beta j hat minus beta j naught by standard error of beta hat j. Now we know that the t test is associated with a certain degrees of freedom and what degrees of freedom we should use in the t test. Very interesting result is use the degrees of freedom which you had used for the residual sum of squares in the t test also. And you should also by now be familiar with what is meant by the standard error of beta j hat and you should recollect that it is sigma squared cjj where cjj is the diagonal jjth element of the variance covariance matrix and sigma squared is the error variance. Unfortunately we do not know the error variance the true value of the error variance. So what we do is we use the standard error instead. So as I said earlier just now cjj is the diagonal element of the variance covariance matrix and the variance covariance matrix is given by x prime x inverse corresponding to beta hat j. Now let us see the regression sum of squares due to the intercept beta hat naught. This is a very interesting thing and in some places it may be skipped and that may lead to some loss of clarity in understanding the concept of linear regression. In some ANOVA tables you would find the sum of squares corrected for beta hat naught or in some tables of ANOVA you will find uncorrected total sum of squares. So what is really the correction all about? So it depends on whether we consider the intercept or not. So we know the total sum of squares is y prime y minus sigma is equal to 1 to n yi whole squared by n. The actual total sum of squares based on the responses is y prime y. You simply square each response and then total it up and that gives you the actual total sum of squares. So you are deducting some portion from the actual total sum of squares that is you are deducting i is equal to 1 to n sigma yi whole squared by n okay. So this is the correction you are doing to the total sum of squares and the regression sum of squares does not include the intercept beta hat naught contribution and has contributions only from beta hat 1, beta hat 2, beta hat 3 so on to beta hat k. So that is the reason why since you are having these 1 to k which is k independent regression parameters you have k degrees of freedom. So what actually happens is the contribution to the sum of squares due to beta hat naught is sigma is equal to 1 to n yi whole squared by n okay. So we are removing the contribution to the total sum of squares that is y prime y with the subtraction by n y bar squared okay. Sigma is equal to 1 to n yi whole squared by n is n y bar squared and what we are doing is we are subtracting from the total sum of squares n y bar squared. We call that as the contribution by the intercept parameter beta hat naught okay. Why should it be n y bar squared okay. This is a very simple explanation to this it is quite nice actually. So when you consider no other parameter except beta hat naught okay in your regression model then the regression parameter beta hat naught would be simply the average of the experimental data points okay. What does it mean? Suppose we are very lazy to fit a regression model considering the variables. We say that y predicted is equal to beta hat naught only okay. So then what will happen is if we carry out the regression analysis we will find that the estimated beta naught parameter would be only y bar okay. The average of all the responses. So when you have scattered data then let us say that we are having only one regressor variable x1. So we are having y1 as a function of x1 and when you plot the data on the graph sheet you will find that you will have scattered data and when you are fitting only a simple model then the model will be nothing but y hat is equal to y bar where y bar is the average of the responses and you will have one horizontal line passing through the data points. Let me illustrate this on the board. So what we have here is the experimental data we are plotting y as a function of x1. Obviously there is a effect of x1 on the response that is why you are finding that when x1 increases the data also increases. But if we take up a regression model saying that y hat is equal to beta naught okay this is our very simple regression model then all we are doing is fitting a straight line which is nothing but the average of all the responses and so we get a horizontal or a straight line parallel to the x axis and that straight line is nothing but the average value y bar. So since beta hat naught is equal to y bar the sum of squares contribution from beta hat naught will be y bar squared plus y bar squared for the n experimental data points and you will get n y bar squared. So this is the contribution to the sum of squares by the parameter beta hat naught. So if you want to correct your sum of squares and the regression sum of squares with the contribution from beta hat naught then you subtract it with n y bar squared and that is what we are doing. Now it is go back to the resolution of the residual sum of squares and even before that we looked at the resolution of total sum of squares you can see that the sum of squares of total we have subtracted the contribution by the parameter beta hat naught and that is y prime y minus n y bar squared. So the n y bar squared represents the contribution from the intercept beta hat naught. So when you are subtracting n y bar squared from the total sum of squares you should also subtract n y bar squared on the other side of the equality so that you maintain the balance okay. So we see that this is the total sum of squares and this is the regression sum of squares y prime y is the actual total sum of squares beta hat prime x prime y is the regression sum of squares including all the regression coefficients and we are subtracting here n y bar squared and then we are also subtracting n y bar squared. So this is the total sum of squares corrected for beta hat naught and this is the regression sum of squares excluding the parameter beta hat naught. So I hope now you have understood why we subtract n y bar squared from the total sum of squares and from the regression sum of squares. Then we looked at the contribution from beta hat naught and the regression sum of squares when it is subtracted by n y bar squared does not include the contribution from beta hat naught. So the number of degrees of freedom is reduced by 1 because we are removing beta hat naught from the list of parameters from the list of p parameters so p-1 will be equal to k. Now let us look at the extra sum of squares method. This is a very interesting issue okay. What we saw earlier was looking at individual regression coefficients. So we can keep doing it for all the regression variables or regression coefficients. We can start with beta 1 hat then we can look at beta 2 hat and so on to beta k hat okay. So that is somewhat tedious process and sometimes you may also have an existing model and when you report the existing model to your supervisor, he may say that you have considered only a model with 2 variables. Why do not you consider or build a model with 5 variables? So what I am trying to say is we can use the matrix or linear algebra concepts to do this pretty efficiently rather than do one variable or one regression variable at a time which is a somewhat tedious process. We can first analyze a model with a certain bunch of variables and that would be an existing model and then we can also see the impact of adding another bunch of variables to the already existing model and we can then decide whether adding the additional bunch of variables also has any impact or value addition to the regression model okay. So normally the simpler the model, the less number of variables a model has, it is elegant and it is convenient to use and it is also efficient okay. So you have done lot of work and then reduced complicated process by describing its dependence with only a few selected variables and when you present this model to let us say to the management, the people there may be a bit disappointed okay. We thought this is such a complicated process, why do you have only few variables describing the response? It looks like other parameters or other regressor variables also might influence the experiment. So why do not you go back and check your model? So what we can do is instead of adding one regressor variable by considering the effect of one regression coefficient at a time, we can take a whole bunch of regressor variables with their associated regression coefficients and use a method called as the extra sum of squares approach to see the impact on the process response. So what we are doing is we are going to conduct a hypothesis test to see whether the new bunch of regression coefficients are indeed valuable and if the test says that none of the new added regression coefficients are significant, all of them may be pretty much taken to be 0, then you may go to the management and say look my original model was in fact adequate, there was absolutely a very negligible impact of considering the effect of additional variables. So what we do here is something which may be a bit difficult for people who are not familiar with linear algebra but actually it is very simple, okay. So let us look at the beta column vector which is comprised of 2 sub vectors if you can call it like that beta 1 and beta 2, so beta 2 is column vector and beta 1 is also another column vector when you put them one below another it leads to the complete column vector beta. So beta 2 is an pre-existing or model which is already existing and beta 1 is the set of regression coefficients which you want to add to an already existing model. So let us say that beta 1 comprises of r regression coefficients and beta 2 is comprising of p minus r regression coefficients. So we say that H naught beta 1 is equal to 0 and H 1 beta 1 is not equal to 0, there is a small difference here from what we have done earlier, earlier we were looking at scalars or just single values beta 1 but now I am putting beta 1 in bold that means it is a vector comprising of r regression coefficients beta 1 hat beta 2 hat so on to beta r hat, okay. So we are saying that the entire bunch of entities in that beta 1 column vector is equal to 0 and the alternate hypothesis says that beta 1 is not equal to 0, okay and so you are having the new model represented by beta 1 and already existing model by beta 2. So the regression coefficient vector beta is split into what was already present in the model equation beta 2 and what is currently being added to it which is beta 1, okay. So we want to see what is the impact of adding the new terms in beta 1 vector to an already existing model. So what we do here is we first look at the full model so that is what we have to do. First we know the sum of squares of regression including the parameter beta hat not as beta hat prime x prime y. Let me sort of revise when you include in the intercept also we have beta prime beta hat prime x prime y but if you want to exclude the parameter intercept beta hat not then you have to subtract n y bar square but you are not doing it here, okay you are considering all the parameters including the intercept beta hat not that is why you have sum of squares of regression as beta hat prime x prime y and then you have the mean square error as y prime y minus beta hat prime x prime y and that you scale it by the n minus p degrees of freedom, okay. But also another thing you have to notice whether you consider beta hat not or not consider beta hat not the mean square error does not really care because the n y bar square you subtract it from y prime y you are also subtracting from beta hat prime x prime y so that n y bar square actually cancels out. So whether you consider n y bar square or not consider n y bar square it does not really matter to mean square error because you are subtracting consistently n y bar square from y prime y and also from beta hat prime x prime y so that thing actually cancels out and so this mean square error does not really bother about it, okay. In other words it does not really care whether you are considering the model with the intercept or without the intercept, okay. So we have the mean square residual or the mean square error here and let me sort of make a correction here to be consistent with what I had written earlier I will change this mean square error to mean square residual since both of them have the same starting alphabet r we use MSE, okay but we use mean square residual when we use the full form. So I would like to conclude by saying that the mean square residual does not really depend upon whether you have considered the actual total sum of squares or the corrected sum of squares be it total sum of squares or regression sum of squares. If you are considering the corrected sum of squares the n y bar square will consistently cancel out here and here, okay but if you are not using it well and good no problem you are considering the parameter beta hat naught and the mean square residual value will be unchanged, okay. So the sum of squares of regression beta hat naught is the regression sum of squares due to beta hat naught corresponding to the full model including all the partial regression coefficients beta hat naught beta hat 1 so on to beta hat k, okay. So as beta hat naught is included the term n y bar square is not subtracted, okay. So the full model this is what we have been considering until now is now split into a model already existing with the subset of the coefficients and a new model with the additional set of regression coefficients, okay. So let us look at the full model this is the vector of responses this is the x matrix this is the beta column vector which is the full set of regression coefficients and then you also have the error column vector. You might not probably for whatever reason you might not have considered beta naught and beta 1. You might have started your model with the beta 2, beta 3 so on to beta k only that is your existing model but then your boss would say what happened to the intercept what happened to the variable 1 they also look important to me from an intuitive point of view why do not you include it so then a new model would be adding beta naught and beta 1, okay. So what we do is this is the full model we split this into x1 beta 1 plus x2 beta 2 plus epsilon actually this is not simple algebraic addition this is involving matrices this is the overall response vector y and then you have the x1 beta 1 plus x2 beta 2 beta 1 is the new model it is a new vector comprising of the new regression coefficients and beta 2 is the column vector comprising of the old regression coefficients and x1 is again a sub matrix of x which are dealing with the regressor variables corresponding to beta 1. So x1 is the columns of x associated with the beta 1 and x2 is the columns of x matrix associated with beta 2. So just let us go back for example I told you that the new model was based on beta naught and beta 1 based on the bosses recommendation, okay. So then the x1 matrix will be the sub matrix obtained by taking the first two columns, okay that is what we are looking at the intercept will be associated with just one and then the beta 1 would be associated with the x11 x21 so on to xn1. For example you will have beta naught plus beta 1 x11 and then you will have beta naught plus beta 1 x21. So you are considering the effect of the intercept and you are also considering the effect of the first regressor variable x1, okay. So this is how we do it. The old model already had these regressor variables starting from x2, x3 so on to xk. So x2 would be associated with beta 2, x3 would be associated with beta 3 and xk would be associated with beta k. So we have to consider a hypothesis test to check if beta 1 is really significant. The reduced model if the null hypothesis is true becomes y is equal to x2 beta 2 plus epsilon. Null hypothesis means that there is no value addition on adding the elements in beta 1 so you are okay with this model y is equal to beta 2 x2 beta 2 plus epsilon. So the coefficients of the reduced model can be found in the usual way by x2 prime, x2 inverse, x2 prime y. So we have the extra sum of squares method. This is the full model. This is the model split into contributions from beta 1 and then beta 2 or in fact let me come again. It is the contributions from beta 2 and then from beta 1 and the sum of squares of regression due to beta 2 hat alone is pretty straightforward. It is beta hat 2 prime x2 prime and so the sum of squares of regression due to beta hat 1 given beta 2 hat already present in the model is the beta hat prime x prime y minus beta hat 2 prime x2 prime y okay. In order to find the regression contribution by the new model we take the full model first the regression sum of squares from the full model and then from that we subtract the regression sum of squares from the already existing model so that the difference will give you the contribution to the regression sum of squares from the new model. And the degrees of freedom for the original or the full regression sum of squares is p. It includes all the parameters while the degrees of freedom for the sum of squares of regression beta hat 1 given beta hat 2 is r because if you recollect we split the beta column vector into 2 parts into 2 column vectors. The first column vector was of size r and the second column vector was of size p minus r. So the full model is having degrees of freedom of p and the new model is having the degrees of freedom of r okay. And so the sum of squares of regression beta hat 1 given beta hat 2 is also termed as the extra sum of squares due to beta hat 1. So what is the extra regression sum of squares brought in by the new set of regression coefficients. So it is the increase in the regression sum of squares due to including the variables x1, x2 so on to xr in the model and it is also independent of mean square error. So this concludes our lecture on the hypothesis testing in linear regression. It is quite elegant and you can see that whatever we did in our earlier phase or the first phase of the design of experiments namely the hypothesis testing is also playing a very valuable role here. Thanks for your attention.