 the selecting best regression model. In the previous lecture, we have learned how all possible model involving at most k minus 1 regressors is evaluated using according to some criteria and the best model has been selected. So, today we will be discussing on sequential selection. So, here instead of you know evaluating all possible subset models, we I mean we do not evaluate all possible subset regression models here. Here the best model is found by adding or removing one regressor in each step. So, there are three basic three algorithms of this type you know sequential selection. The algorithms are backward elimination and the next one is forward selection and the third one is you know step wise selection. So, before I start talking on sequential selection, I want to recall partial test which we discussed in multiple linear regression setup. So, we are considering the multiple linear regression setup and we have a model with k minus 1 regressors. So, y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus beta k minus 1 x k minus 1 plus epsilon. And by testing this hypothesis like H naught which says say beta i equal to 0 against the alternative hypothesis H 1 which says beta i naught equal to 0. So, the meaning of this hypothesis is that basically by testing this hypothesis, we test the significance of a given regressor say x i in the presence of in the presence of other regressors in the model. So, we know how to test this hypothesis. We can either use the t statistic to test this hypothesis or also we can use the extra sum of square technique to test this hypothesis. We will just let me recall that thing. The first one is we can test this hypothesis using the test statistic t which is equal to beta i hat by m s residual x prime x inverse i i. So, this is the i i th element of x prime x inverse and this follows, we know that this follows t distribution with degree of freedom n minus k. And we reject the null hypothesis, we reject H naught if mod t is greater than t alpha n minus k. So, this is the test using t test statistic. Now, you know this can be also tested using the F statistic that is the extra sum of square approach. Let me talk about that. That is called we will say it partial F test, partial F test and we will be using this partial F test repeatedly today. So, what we do is that the test statistic to test H naught beta i equal to 0 against the alternative hypothesis H 1 that beta i not equal to 0. The test statistic is F which is equal to s s regression for the full model minus s s regression for the full model except x i. So, here you have all the regressors x 1 to x k minus 1, here you have all the regressors except the regressor x i. So, this difference will give you the extra sum of square due to the regressor x i or more precisely we can say that this difference is the extra regression sum of square due to the regressor x i. So, this has degree of freedom 1. So, you divide it by 1 and by m s residual that is for the full model. This follows F distribution with degree of freedom 1 and n minus k. And here so this is called the partial F test statistic and we reject H naught if this F value is small. We reject H naught if F is greater than the tabulated value F alpha 1 n minus k. So, rejecting H naught means accepting this one that means x i the ith regressor has significant contribution to explain the variability in y in the presence of other regressors in the model. So, if this difference is large if this difference is large that means the ith regressor has significant contribution. So, if F is large the difference is large then F is large F is large then we are going to reject the null hypothesis and accept the alternative hypothesis. So, this is all these things we talked before also. Let me talk about the first technique that is called backward elimination. So, it says that the basic idea behind the backward elimination is that see we are looking for the best subset regression model. The basic idea is that you start from the full model and then in each step you reject the regressor which is less significant. So, if it is a problem with 4 regressor variables then you start with the model which is having 4 regressors and then the first step you remove the regressor which is less significant to explain the variability in the response variable. So, this is the basic idea behind the backward elimination and there is some stopping criteria of course, we will be talking about that. So, it says that you start with the first thing is that you start with full model and then you compute partial F statistics for each regressor in the presence of other regressor. I will explain all these things. Let me just write down the algorithm first. Next is see you are computing the partial F statistic for each regressor in the presence of other regressor. So, if the F value is very small for some particular regressor that means that is not significant. So, we look for the smallest partial F value. The regressor smallest partial F value is removed from the model if the value is less than F out. Well, sometime this F out is pre specified or this F out is equal to the tabulated value of F 0, 5, 1 and this error degree of freedom, right. And this error degree of freedom then partial F statistic computed for this new model. By new model, I mean the model which has been obtained by removing the first regressor for this new model and the process repeats. And here is the stopping criteria backward elimination algorithm. Terminates when the smallest partial F value is greater than F out. So, let me just explain a little bit and then I will illustrate this algorithm using one example. We start with the full model and then here we compute the partial F statistics. So, this partial F statistics say for example, for the random variable for the regressor x i that will give the the partial F statistics associated with the regressor x i will give you the significance of the regressor x i in the presence of other regressors. So, this way we compute the partial F statistics for all the regressors. So, lower the value of F statistic indicates that the the associated regressor is less significant. So, next we look at the smallest partial F value and if that partial F value is you know less than F out or less than the tabulated value. Then of course, we reject I mean we remove that regressor from the model and again you know we repeat the same process until there is no partial or there is until a step or a stage where the smallest partial F statistic is greater than the F out. That means the smallest one is also significant. So, we cannot remove any more regressor from the from the model. Well, let me explain or illustrate this backward elimination using one example. So, I will be considering the same data that is the Holt cement data. So, we have we have four regressors and one response variable and we will see which one is the best subset model according to the backward elimination technique. Well, so what we do is that we are considering the Holt cement data Holt data. So, we start with the full model. So, we will first fit y as a function of all the four regressors variable x 1, x 2, x 3, x 4 that means you fit the full multiple linear regression model and the fitted model is y hat which is equal to 62.41. So, this is the fitted model and now what we have to do is that we have to consider the we want is that we look for the less less significant regressor in this model. For that what we do we compute the partial F statistic associated with x 1. We compute the partial F statistic associated with the regressor x 2 similarly for x 3 and x 4. So, this is the notation F 1, 2, 3, 4 this is called the partial F statistic associated with the regressor x 1 in the presence of x 2, x 3, x 4. Similarly, also we will compute the partial F statistics associated with x 2 in the presence of x 1, x 3 and x 4 in the model and also we will compute for all the regressors. So, we will compute F 3 in the presence of 1, 2, 4 we will compute F 4 in the presence of 1, 2, 3. Now, let me explain how to you know just now we discussed how to compute this partial F statistic value. So, this can be computed using I mean this is the value of this statistic. You compute S S regression for the full model. So, I will use some other notation like 1, 2, 3, 4 that is my full model because there are only four regressors in the problem and by 1, 2, 3, 4 I mean that all the four regressors x 1, x 2, x 3 and x 4 are in the model. So, this is the S S regression for the full model minus S S regression for the model with random variable 2, x 2, x 3, x 4. So, this is x 3 and x 4 also I mean instead of 1, 2, 3, 4 you can you can write x 1, x 2, x 3, x 4 x 2, x 3, x 4 may be next time I will do that by MS residual MS residual for the full model that means 1, 2, 3, 4 all the four regressors are there in that model. I mean this is the MS residual when all the four regressors are present in the model well. So, what we have to do is that we need to get this one we need the unavoidable for the model involving four regressors variable. So, here is the fitted model involving four regressors x 1, x 2, x 3 and x 4 and here is the unavoidable for this model for this full model. Now, the S S regression is 2667.9 right. So, I will use this variable for the value this S S regression for the full model is 2667.9 minus the S S regression for the model involving x 2, x 3, x 4. So, again I need to fit this model I need to fit the model involving x 2, x 3 and x 4 and here is the associated I mean the unavoidable for this model and my S S regression for this model is 2641. So, I will use this value 2641.95. Now, this difference you know this difference is the extra sum of square due to x 1. Now, the MS residual value for the full model again recall I mean. So, this is the full model and the MS residual value is 47.86 and look at the degree of freedom here you know we have 13 datas. So, the degree of total degree of freedom is 2 and the residual has degree of freedom 8 because of the fact that you have 1, 2, 3, 4, 5, 5 unknowns. So, you will be getting 5 or what is called a normal equations that means 5 constant on residual. So, 13 minus 5 is equal to 8. So, the residual has degree of freedom 8 and hence the regression has degree of freedom 4. Now, what I want to say is that I will be using this MS residual that is the MS residual for the full model. So, 5.98 this value is equal to 5.98. So, this is equal to 4.34. We need to compute the other values also maybe I will just explain this one also f 2 given 134. So, f 2 in the presence partial f statistics for x 2 in the presence of x 1, x 3 and x 4 in the model. So, this is regression SS regression for the full model that means x 1, x 2, x 3, x 4 minus SS regression for the full model. For the model involving x 1, x 3 and x 4 right by MS residual for the full model that means x 1, x 2, x 3, x 4 well. So, let me check whether I have an invertible for this one. Yes, I have we know that this SS regression is 2667.9. Now, I need a model involving x 1, x 3, x 4. So, here is the fitted model y equal to x 1, x 3, x 4 and the an invertible here. So, the SS regression is 2664.94 right. So, this is an invertible for the full model minus 2664.93 by the MS residual for the full model that is 5.98 and this value you can check that this is close to 0.50. Similarly, you compute I am not going into the detail for the other calculation. So, F 3 given 1, 2, 4 you can check that this is equal to 0.02 and F 4 given 1, 2, 3 is equal to 0.04. So, we have all the full model 4 partial F values. The first one was F 1, 2, 3, 4 that value was 4.34. Now, we have to see which one is the less significant. So, the smallest partial F value now the smallest because we want to remove one regressor the less significant regressor from the model. So, the smallest that is why we look for the smallest partial F value is which one is the smallest this one is the smallest is F 3 1, 2, 4 which is equal to 0. Now, we can remove this. So, the associated random variable is X 3. We can remove the regressor X 3 from the model provided that this observed partial F value is less than the tabulated value. So, the tabulated value or F out here is F 0.05. So, it took alpha equal to 0.05 and the degree of freedom of this partial F statistics you know it follows F distribution with degree of freedom 1 and the error degree of freedom for the full model. The error degree of freedom is 8. So, it follows F 1 8. So, we will find the value of F 0.0518 which is equal to 5.32. Now, this one is the smallest partial F value which is F 3 associated with X 3 1, 2, 4 this one is less than the tabulated value. So, X 3 is insignificant we can remove X 3 from the model. So, we can so we remove from the model. Now, what we do is that once X 3 is removed from the model we are left with X 1, X 2 and X 4. So, we fit a model between the response variable and X 1, X 2 and X 4. So, we fit least square equation on model between Y and X 1, X 2 and X 4. And the fitted model is Y hat equal to you can check that we have the data. So, you know how to you know how to fit multiple linear regression between Y and X 1, X 2, X 4 that is 71.65 plus 1.45 to X 1 plus 0.5 to X 1 plus 0.5 to X 1 plus 0.5 to X 1 plus 0.5 to X 1 plus 0.416 X 2 minus 0.237 X 4 right. So, this is the model after removing X 3 from the full model well. Now, again what we do is that we will try to see whether we have a regressor which is less significance in the presence of other regressors. So, what we do is that we compute three partial F values and there F 1. What is the significance to check whether X 1 is significant in the presence of X 2 and X 4? We compute this partial F statistics 1, 2, 4 right. We also compute F in the presence of 1, 4 and F 4 in the presence of 1, 2. And we again we look for the smallest partial F value and which one is if that is again less than the F out then we remove that regressor from the model. So, we will keep on doing that well. So, what is this value? This value again I would like to explain in detail this one is obtained by computing. So, there is a little difference that is why I want to discuss this one in detail. So, here you compute SS regression which is regression for the model X 1, X 2, X 4. So, this is the SS regression when the model involves X 1, X 2 and X 4 minus SS regression when the model involve only X 2 and X 4. So, this difference will give you the contribution the extra sum of square or extra regression sum of square due to X 1. And this divided by MS residual sorry yes MS residual for the model X 1, X 2, X 4. So, see now this MS residual is not anymore for the full model it now at this moment our full model is involving three regressors X 1, X 2 and X 4 right. So, to get this value again X 1, X 2 and X 4. Yes I have one invertible for that. So, here is the fitted equation X 1, X 2 and X 4. So, X 1, X 2 and X 4 I have one invertible for that. So, this X 1, X 2, X 4 and here is the invertible. Now you can check that the total degree of freedom is 12 this is same. Now the residual has degree of freedom 9 because we have 1, 2, 3, 4 unknown parameters. So, 13 minus 4 is equal to 9 right. So, this is the SS regression for this model and this is the MS residual. We are going to use this MS residual now right. We are not going to use, we are not going to use this MS residual that is for the model involving all the four regressors right. So, at this moment my full model is this one. So, SS regression is 2667.17. So, we are going to use this MS residual value is 5.9. I do not know the value of this one. You can check that you fit a model between Y and X 2 and X 4. You find out the SS regression value that will be 2657.9 and the MS residual value is 5.33. It is not 5.98. This one is equal to 6.7. So, this is the total 1.86. No, I did mistake here. This one is not 2657. It is 1846.88. The final value here is equal to 154.01 right. Now, you can check that you can see that the value of this partial F statistic is equal to 1.86. Value of this partial F statistic is equal to 5.03. So, the smallest partial F value is this one. That means, if this F is less than the tabulated value, then we are going to remove X 4 from the model because this partial F statistic is associated with X 4. So, let me write down the smallest partial F value F 412 is equal to 1.86 and the tabulated value is F 0.05 with degree of freedom 1 and here now the error degree of freedom is not 8. It is 9 because see this is our full model at this moment and the error degree of freedom is or sorry residual degree of freedom is 9. The residual degree of freedom is 9 and this value is equal to 5.12 and this is the F out. Now, again the partial F value associated with X 4 is less than this one. So, this what we do is that we remove X 4 from the. So, the next step what we do is that see we have we initially we started with X 1, X 2, X 3, X 4. Then in the first step we have removed X 3 from the model. In the second step we have removed X 4 from the model. Now, we are left with X 1 and X 2. So, we now we fit this square equation Y hat equal to F X 1, X 2 and this Y hat is going to be equal to. So, that means we are trying to I mean we will fit a model involving X 1 and X 2 only and the fitted model is Y hat equal to 52.58 plus 1468 X 1 plus 0.662 X 2. So, you know how to fit how to find the unknown parameters. Now, again see we are having a model with involving X 1 and X 2. Now, we need to check whether again we can we can remove any more regressors from this model. That means we have to check the significance of X 1 in the presence of X 2 and also we have to check the significance of X 2 in the presence of X 1. So, we need to compute two partial F statistics. One is F partial F statistics for associate with X 1 in the presence of X 2 and the other one is partial F statistics associated with X 2 in the presence of X 1. This can be done like F 1 2 this one is equal to S S regression involving X 1 and X 2 in the model minus S S regression involving X 2 in the model and this one is equal to F 1 you know basically this divided by the degree of freedom is 1 by M S residual for the model involving X 1 and X 2. Well, I do not think that I have a invertible for this one, but you can check that you fit a model between Y and X 1 X 2. We have the fitted model is it just now we have to fit at it. So, once you have the fitted model you can compute the S S residual and then you know the S S total is same. Then you can get the S S regression value right. So, the S S regression you can check that this is equal to 2657.9 minus you fit a model between Y and X 2 the fitted model is equal to 57.4 plus 0.789 X 2. So, I am giving you the fitted model now you can check that S S regression value is equal to 1809.4 and the M S residual for the model involving X 1 and X 2 is 5.8 that you can check. So, the partial A value is going to be 146.52 and similarly you can check that F 2 1. So, the partial F statistics associated with the random associated with X 2 in the presence of X 1 is equal to 208.58. Now, these partial F values are large. So, still we will check you know the smallest partial F value the smallest partial F value is which one is smallest F 1 2 F 1 2 which is equal to 146.52 and we compare this value with the tabulated value that is F 0.05 with degree of freedom 1 and if there are two regressors in the model then if there are two regressors in the model then the number of unknown parameters is equal to 3. So, the residual degree of freedom will be 13 minus 3. So, that is 10 and this value is equal to 4.96, but this one is equal to 4.96. One is not smaller than this one. So, that means the model we have at this moment involving X 1 and X 2 both are significant. I mean both X 1 is significant in the presence of X 2 and similarly X 2 is also significant in the presence of X 1. That is why their partial F statistic value are large and we cannot remove any of them from the model. So, the backward elimination algorithm terminates here. So, here we say that the backward elimination algorithm terminates and yields the equation the final equation is y hat equal to 4. So, 52.5 that means the model involving X 1 and X 2 is the best subset regression model 1.468 X 1 plus 0.662 X 2. So, the output of the backward elimination algorithm for the hold cement data is that the subset model involving X 1 and X 2 is the best according to the backward elimination algorithm. Next we will be talking about the forward selection. So, what is the basic motivation behind the forward selection is that instead of in the backward elimination we started from the full model and then in each step we try to eliminate the less significant regressor from the model. And in the forward selection what we do is that we start from a model having no regressor variable. So, initially there is no regressor in the model and then in every step we try to find the most significant regressor for the response variable and we add every step we add one regressor to the model. So, that is the just opposite of backward elimination. Let me just write down the algorithm now. So, the first step is that no regressor model and then you know all possible models the first step is that the all possible models one regressors are considered an F statistic for each regressor is computed. That means, if you have the if you have four regressor variable we will start with the we will consider the model one regressor model involving X 1 and then the regressor model involving X 2 only and then the regressor model involving X 3 only and then the regressor model involving X 4 only. And then we compute the F statistic value for each of them and the regressor having the highest F statistic value is added to the model anyway I do not have you know time today. So, better I will start this forward selection in the I will talk about this forward selection in the next class. Thank you for your attention.