 fourth lecture selecting best regression model and this one is basically you know continuation of the previous class. In the last class we talked about backward elimination and today we are going to talk about forward selection and step wise selection. First I will talk about forward selection. So, the basic motivation behind the forward selection is that you know we start with no regressor in the model and then every step we add the most relevant regressor to the model. So, every step we keep on adding one regressor to the model and there is some stopping criteria. So, we will talk about those things now. Well, so the first step is you know you start with no regressor in the model and then step to all possible models with one regressor, with one regressor are considered and F statistic for each regressor is computed. Well, so if there are k minus 1 regressor in the model you have to compute k minus 1 F statistic and the regressor having highest F statistic value is added to the model is added to the model provided if the value the highest F statistic value is greater than F alpha 1 n minus k. So, this is basically you know residual degree of freedom sometime in the last class I told you know I told that this is error degree of freedom. So, error degree of freedom and residual degree of freedom the same well I will illustrate using some example later on. Let me just write down the algorithm first and then step 3 partial F statistic computed for all of the remaining regressors in the presence of in the presence of previously selected regressors and the one yielding the highest F F is added to the model provided or if of course, if F is greater than the tabulated F F value that means F alpha 1 n minus k I should not say n minus k maybe I should write you know 1 residual degree of freedom here it might be no here also I should write residual residual degree of freedom well and the stopping criteria is that the step 4 the forward selection terminates when the highest partial F statistic at a particular stage does not exceed F i n. So, F i n means I mean the tabulated value of F. So, basically this one is F alpha 1 and the residual degree of freedom or when the last candidate regressors is added to the model. So, this is the algorithm and now what I want to do is that I want to illustrate this algorithm using one example. Let me just give the overview of this one initially we are starting with no regressor in the model and then all possible model with one regressor sorry with one regressor are considered that means suppose my problem has four regressors say x 1, x 2, x 3, x 4 and one response variable y. So, what I will do is that I will the first step I will consider the F statistic value for the model y equal to beta naught plus beta 1 x 1 plus epsilon y equal to beta naught plus epsilon y equal to beta naught plus beta 2 x 2 plus epsilon y equal to beta naught plus beta 3 x 3 plus epsilon y equal to beta naught plus beta 4 x 4 plus epsilon. So, these are the regressor model or regression model with involving one regressor variable. So, I will compute the F statistic for each of this model and the model or the regressor having the highest F, having highest F means the associated if say F 1, F 1 is associated with this model this is F 2, F 3, F 4. So, F 1 is highest means the associated random very associated regressor x 1 is significant to explain the variability is the most significant among the four regressors to explain the variability in y. And once you select you know suppose x 1 is selected and then the next step is that you know keeping x 1 in the model say y equal to beta naught plus beta 1 x 1 is there in the model. Now, we seek for the next best regressor in the model. So, we will try with the model like beta 1 x 1 plus beta 2 x 2 plus epsilon and we will try with the model y equal to beta naught plus beta 1 see x 1 is fixed and then beta 3 x 3 plus epsilon. And y equal to beta 2 plus beta 1 x 1 plus beta 4 x 4 plus epsilon. So, we will compare in the presence of x 1 we will see which one is the best whether x 2 is best in the presence of x 1 or x 3 is best in the presence of x 1 or x 4 is best in the presence of x 1. So, I mean depending on that and that can be evaluated by computing the partial F value and the highest partial F value the model which is having the highest partial F value will be the associated regressor will be included in the model. So, this is the and we keep on doing this thing and there is some stopping criteria this says the stopping criteria. So, it says that you know forward selection terminates when the highest partial F statistic at a particular stage does not exceed the threshold value. That means, once you have if see if the partial F value here is not greater than the threshold value. That means, x 1 is enough for the model you do not need to include any other regressor in the model. So, we will stop there I mean the final model will be this one well. Let me explain I mean in detail the algorithm. So, this is the outline of the algorithm. Now, I will be considering the halt cement data again. So, you know it has the four regressors and one response variable y well. Now, I illustrate the forward selection technique using the halt cement data well. So, what I do is that initially there is no regressor in the model and then I will compute see for this data there are four regressors x 1 x 2 x 3 and x 4. Now, I will compute F value for the model y equal to beta naught plus beta 1 x 1 plus epsilon. The F value associated with this model I will call it F 1 using the previous notation I will say F 1 nothing. That means, the this is the F statistics, but in terms of the partial F value I will this is this the meaning of this one you know this is the partial F value for the regressor x 1 in the presence of no other regressor in the model that is why this dashed. We compute this value similarly, we considered the second simple linear regression model that is beta y equal to beta naught plus beta 2 x 2 plus epsilon the associated F value is F 2 in the presence of nothing. So, we will evaluate this model also y equal to beta naught plus beta 3 x 3 plus epsilon. So, F 3 in the presence of nothing and y equal to beta naught plus beta 4 x 4 plus epsilon. So, I will compute F 4 in the presence of nothing well. So, how to get this value you know you just I have this invertible for this model. See this is the fitted equation for the Hald cement data you know the same thing you compute once you have the fitted value you compute the SS residual you know SS T this is the SS T and then you have the SS regression value SS regression SS residual and then you have the F statistic. So, this is how you you have to find the F statistic value for this model and in the forward selection algorithm we denote this F by F 1 in the presence of we call it partial F. So, this is basically the know global F, but we call it F 1 in the presence of no other regressor in the model. So, F 1 value is 12.6. So, it is 12.6 similarly you fit this model you will get the value is equal to 21.96 this value is equal to 4.40 and this value is equal to 4.40 and this value is equal to 22.8. Now, of course, the highest highest F value is F 4 which is equal to 22.8. So, F 4 is so X 4 is most significant to explain the variability in the model. So, this is the Y. So, X 4 and this value is also you know this is greater than F you compute F 0 5 1 and the residual degree of freedom here you have only 1 2 unknown in the model. So, 13 minus 2 that is 11. So, this is the residual degree of freedom and this one has value 4.84. So, that observed value is greater than the tabulated value. So, X 4 is added to the model. Now, the next thing is that we compute the partial F statistic. For all of the remaining regressor in the presence of X 1. So, we compute the partial F F for all of the remaining regressors remaining regressors means see X 4 is already in the model. So, in the presence of X 4. So, X 4 is already in the model. So, what we do is that we will compute F 1 in the presence of X 4. We will compute the partial F statistic associated with X 2 that is F 2 in the presence of X 4 in the model and also we compute F 3 in the presence of X 4 in the model. Well, I hope that you know how to compute this value for example, say F 1 given 4. That means, you have to consider the model Y equal to beta naught plus beta 1 X 1 plus beta 4 X 4 plus epsilon. You feed this model and then S S regression for the full model that is 1 4 that means, X 1 X 4 minus S S regression X 4 by M S residual for the full model means you know 1 and 4. So, this can be this you can check that this value is equal to 2641 minus 1831.9 by 7.5 and this is going to be 108.22 and similarly you can check that F 2 in the presence of X 4 in the model is equal to 0.17 and F 3 in the presence of 4 X 4 in the model is equal to 40.29. So that means, so if the highest F statistic value is equal to 108. So, the highest per partial F value is F 1 in the presence of 4 which is equal to 108. And you know now you check the tabulated value F 0.051 and the residual degree of freedom here. Now, you have 3 unknown in the model. So, 13 minus 3 that is 10 that is the residual degree of freedom and this one is equal to this value is equal to 4.96 and of course, this one is larger than then 4.96. So, in the presence of 4, so the meaning of this one see F 1 4 F value is 108.22 which is larger than the tabulated value. That means, in the presence of see first we added X 4 in the model. Now, in the presence of X 4 1 is significant. So, next we will add, so X 1 is now added to the tabulated value. So, X 1 is now added model. Now, my model you know it consists of 2 we have selected 2 random 2 regressors. So, why now is function of X 1 and X 4. Now, what we do is that you know see X 1 and X 4 have been added or selected. Now, in the presence of X 1 X 4 we will see whether X 2 is significant in the presence of X 1 X 4 or whether X 3 is significant in the presence of X 1 and X 4 in the model. So, if none of them are significant then we stop here. So, this is what we have to do now. We will check you know we will check this partial 2 partial F statistic value. What we do is that we will check the significance of X 2 in the presence of X 1 and X 4 in the model. So, we will compute this value and also we will compute F 3 the significance of X 3 in the presence of X 4. So, we will compute this value in the presence of X 1 in the presence of X 1 and X 4 in the model. So, we will compute this 2 value you know how to compute you know I do not want to repeat again you can check that this value is equal to 5.03 and this value is equal to this value is equal to 5.03 and this value is equal to 4. still may be just I will write down this one. So, this is equal to SS regression for the full model that is X 1 X 2 X 4 minus SS regression for the model involving X 1 and X 4 by m s residual for the model involving X 1 X 2 X 4. So, this one you can check this 2667.79 minus X 1 X 2 X 4. minus 2641 by 5.33 and this is equal to 5.03. So, the highest partial F value is this one now whether this will be included. So, whether next X 2 will be included in the model that depends on the tabulated value of F or the threshold value. So, the highest partial F value is F 2 in the presence of X 1 and X 4 in the model which is equal to 5.03. Now, what is the tabulated value F 0.051 and the residual degree of freedom. So, I am talking about 3 regressors in the model full model for computing m s residual. So, there will be 4 unknown that means the degree of freedom is 10 minus 4 that is equal to 9. So, this value is 5.12. See now this one is not greater than the tabulated value. So, we will write down we cannot include X 2 in the model at this moment we have we have no X 1 and X 4 in the model and in the presence of X 1 X 1 and X 4 in the model F X 2 has the highest partial F value, but that value is less than the tabulated value. So, that means the X 2 is not significant is not significant in the presence of X 1 and X 4 in the model. So, we cannot include or add X 2 in the model. So, we have to stop here. So, this is the stopping criteria right well. So, the forward selection forward selection algorithm terminates here and yields the model. So, the output of the forward selection algorithm is Y equal to 103 plus 1.4 that is equal to 0.4. So, I am giving the fitted model basically what I want to see say by this one is I want to say that the final model involve X 1 and X 4 final model involve or yeah involve X 1 and X 4 well. So, this is the output of forward selection and next we move for step wise selection right step wise selection is it is a basically you know combination of X 1 and X 4. So, this is the output of forward selection and next we move for step wise selection right step wise selection is it is a basically you know combination of forward selection and the backward elimination. Here also we start with with the no regressor in the model and the final model let me explain the model you know this is very similar to the forward selection model. What I will do is that I just revise this forward selection model. Now, I am talking about say step wise selection step wise selection I will modify this algorithm. Here also we start with no regressor in the model and then all possible models with one regressors are considered and the F statistic value is computed and the regressor having the highest F statistic value is added to the model. So, there is no for step wise selection there is no condition I mean you do not need to check the tabulated value of F for the first regressor in the model so you do not need to check this condition. So, the highest F statistic value is added to the model. So, if you can recall you know last example X 4 has the highest F statistic value and that had been included in the model. Now, the next step is the partial F statistics are computed for all of the remaining regressors in the presence of the previously selected regressor. That means let me recall the previous example X 4 was included first and then you compute the partial F statistic for X 1 in the presence of X 2 partial F statistics for X 2 in the presence of X 1 and X 3 in the presence of X 4 right and the one yielding the highest F is added to the model if the F is greater than the tabulated value. Here you know may be sometime instead of this tabulated value we can also consider specified threshold value. So, this value might be say for example 5. So, instead of finding the tabulated value every time we specify some threshold value here if the observed F is greater than the specified threshold value then we add the corresponding regressor variable in the model. So, the first two steps are same now in step 4 we have a change here in step 4 what we do is that we check for a possible exit. What we do is that all variables in the model are evaluated with partial F test to see if each one is still significant and at this step any regressor that is no longer significant. Well I do not know whether it is clear to you, but let me just recall the previous example. So, in the first step we included X 4 the regressor X 4 in the in the previous example and then in the second step we included X 1. So, here it says that all variables once you have this model say y equal to beta naught plus first inclusion was beta 4 X 4 and then we added beta 1 X 1 in the model. So, it says that see beta 4 was or X 4 was significant alone when we considered one regressor model. Now X 1 is significant in the presence of X 4 that is why in the second step X 1 has been included in the model. Now the question is you know see of course, the X 1 is the last added regressor in this model. So, X 1 is of course, significant in the presence of X 4. Now we need to check whether X 4 is significant in the presence of X 1. So, this is the difference if it might be the case you know X 4 was significant alone. So, at the first step, but at this moment in the presence of X 1 X 4 might not be significant. So, in so we need to that is why it says that you know all variable in the models are evaluated with partial F test to see if any if each one is still significant. So, and at this step any regressor that is no longer significant is dropped from the model. So, in the presence of X 1 if X 4 is not significant then we will drop X 4 from the model. So, this is the difference. So, this is called you know we check for the possible exit at step 4 and to the next step is the stopping criteria. Step 5 it says that the step wise selection terminates when no other regressor yields a partial F greater than the threshold value and all regressors in the model in the model remains. So, the first. So, this includes two condition if there is no other regressor which has significant partial F value in the presence of the regressors in the model then we cannot include any more regressor in the model and at the same time the regressors which are present in the model they if they remain significant then you do not need to remove one some regressor from the model also. So, then we I mean then the step wise selection algorithm terminates. Well I want to I want to explain or illustrate the step wise selection criteria algorithm using the Holt-Siemann data again. So, here we have a table. So, we will be considering the Holt-Siemann data. So, the same technique you know first you know no regressor. Now, let me introduce one table I have one table this is from the Draper and Smith book page number 337. So, let me explain the significance of these values it says that the partial F values for variable x 1. So, this 12.6 is the F value for the model y equal to beta naught plus beta 1 x 1 plus epsilon and there is no other regressor already present in the in the model. So, x 1 is the only regressor in the model. So, this one is basically you know what we use the notation F 1 in the presence of nothing and this one is F 2 in the presence of nothing F 3 in the presence of nothing F 4 in the presence of nothing. So, if you can recall this value 22.8. Now, the meaning of this value is this one is basically F in the presence of x 1 in the model. So, this is the partial F statistics value associated with x 2 in the presence of x 1 in the model. Let me consider this value. So, this F value is the partial F statistics value associated with x 4 in the presence of x 3 in the model right. So, this one is basically F 3 4. So, this way I have all the partial F statistic maybe I will say again you see this is a known figure 5.03. So, this one is the partial F statistic for the random for the regressor x 2 in the presence of x 1 and x 4 in the model. So, this one is nothing but this one is nothing but F 2 in the presence of 1 and 4 in the model. So, this is I mean we have the all we have all the partial F values. Now, I illustrate the step wise selection. So, initially there is no regression in the model then the same like you know for the forward selection we have the value F 1 which is equal to 12.6 F 2 in the presence of nothing which is equal to 21.96 F 3 in the presence of nothing which is equal to 4.40 and F 4 in the presence of nothing which is equal to 22.8. So, this one is the highest and also this is greater than. So, we do not check any threshold or the tabulated we do not compare it with the with the tabulated value. So, since the F statistics associated with x 4 is highest. So, x 4 is added to the now next what we do is that we seek for we seek the next best x. So, x 4 is already there in the model. So, which one is the next best that can be added to the model. So, for that what we do is that we check the partial F value for x 1 in the presence of x 4 in the model. We check the partial F value for regressor x 2 in the presence of x 4 in the model F 3 4 and from this table x 4 in the presence of x 4. So, this one is F 1 4 this is F 2 4 this is F 3 4. So, F 1 4 is equal to 108.22 F 2 4 is 0.17 and F 3 4 is 40.29. So, F 3 4 is 0.17 and F 3 4 is 40.29. So, the highest partial F value is this one and here you know we fix the threshold value we fix the threshold value equal to 5. I know here we do not want to check the tabulated value every time and since this is greater than 5 since this is greater than the threshold value 5 x 2 sorry x 1 is added to the. Now, at this moment we have x 1 and x 4 in the model. So, y is in terms of x 1 and x 4 at this moment. Now, what we do is that we do not this is the difference here. Now, in the next step we check for a possible exit. So, we have x 1 and x 4 in the model. So, the model is y equal to beta naught plus beta 1 x 1 plus beta 2 sorry beta 4 x 4 plus epsilon and now what we do is that see of course, x 1 is significant in the presence of x 4 because the F value is F 1 4 we compute this two partial value F 1 4 which is equal to 108.22 and the other one also we check like F 4 in the presence of 1 which is equal to 159.03. See this is the last added regressor. So, of course, this is you know this one is significant in the presence of x 4 that is why it has been included I mean in the previous step. Now, the question is whether x 4 is significant in the presence of x 1 that we need to check and here you see that x the partial F value associated with x 4 in the presence of x 1 is also it is a it seems to be significant because this is significant because this is greater than the threshold value 5. So, we do not drop any regressor from this model. So, next step is that you know we seek for for the next best regressor x. So, I have x 1 and x 4 in the model and both x 1 is significant in the presence of x 4 and also x 4 is significant in the presence of x 1. Now, we check the partial F statistics value whether two can be included in this model. So, F 2 1 4 and we also check F 3 1 4. So, this is the first regressor. So, this is here is my table. So, 1 4 is already there in the model. So, F 2 1 4 is F 2 1 4 is 5.03 and F 3 1 4 is 4.24 and this one is the highest partial value and also this one is greater than the threshold value 5. So, x 2 is added to the now I have the model with in my model we have three regressors y equal to F in terms of x 1 x 2 x 4. Now, we need to check you know x whether the newly inserted of course, F we compute the partial F value F 2 1 4 we compute F 1 2 4 we compute F 4 1 2. So, this is to check you know 1 this one is 5.03 this one is 154.01 and this one is equal to 1.84 that you can check from this table right. You can check from this table and now we have see before x 4 was significant. Now, in the presence of even in the presence of x 1 x 4 was significant that is what we proved in the last slide you know see in the presence of x 1 x 4 was significant. Now, in the presence of x 1 and x 2 x 4 is not significant this is less than 5, but in the presence of this two random variable x 1 is significant in the presence of these two regressors x 2 is significant. So, this two will be there in the model, but we need to remove we must remove x 4 from the model. So, the model is y equal to F in terms of x 1 x 2 and again we know we check whether any candidate can be included we seek for new candidate. So, for that we compute F 3 1 2 and we compute F 4 1 2 these values are 1.83 and this is equal to 1.86 you can check from my table and both are less than 5. So, we cannot include any more regressor in the existing model which involves x 1 and x 2. Now, we need to check that whether x 1 is significant in the presence of x 2 I mean we look for possible exit. So, for that you know we check for possible exit. So, we check for F 1 we check F 1 given 2 and F 2 given 1 this one is equal to 146.52 and this one is equal to 208.5 and both are greater than 5. So, you know we cannot remove any more regressor from the model and also this is that you cannot include any more regressor to the model. So, the step wise selection terminates and yields the model y equal to F in terms of x 1 and x 2. So, this is the result of step wise selection. Now, just I want to mention that you know different algorithm I mean the result of the different selection algorithms are different you know if you can recall the backward elimination the result was x 1 x 2 final model was involving x 1 x 2. Now, the forward selection forward selection finally, selected regressors are x 1 and x 4 and for this step wise selection the final output is x 1 and x 2. So, the final output is x 2. So, that is you know the result are not unique that is what I want to say and that is all about you know the model selection. Thank you very much.