 under the multiple linear regression set of and you know that you know in multiple linear regression the number of regression variables is more than one and in most of the practical problems what happen is that you know the number of regressors is very large and having the large number of regressor variables we may wonder you know whether some of them can be some of them are irrelevant and can be removed from the regression equation well so the basic idea you know behind this finding the best regression model is that we need to find an appropriate subset of regressors that can explain the variability in response variable well and finding this subset of regressor variable this problem is called you know variable selection problem well let me explain the thing in in detail there are two there are several algorithms to to solve this problem and those algorithms can be you know divided into two I mean that can be classified into two classes basically one approach is called all possible regression approach and the other one is called sequential selection well so first I will be talking about all possible regression all possible regression say here you know we need to consider all regression equations involving say zero regressors well so if if there are k minus one is the total number of regressors in the multiple linear regression model then you know the number of model having zero regressors is k minus one c zero and the model is basically y equal to beta naught plus epsilon okay so we we will also consider I mean of course the regression equations or models involving one regressor and the number of models number of such models is k minus one c one true two regressors k minus one c two regression models are there involving two regressor variables well similarly we go up to k minus one regressors so number of models involving k minus one regressors is one so total we have two to the power of k minus one regression models and you know these models these equations are evaluated according to some suitable criteria first one is called r square this is the coefficient of multiple determination or coefficient of determination and then we will be talking about the criteria adjusted r square and then ms residual and the finally I mean we will evaluate the equations based on the criteria mannows statistic and this one is denoted by cp okay and this is you know one approach that is you know all possible regression and the other approach is called sequential selection so I will be talking about this sequential selection later on and there are three algorithms of this type those are called forward selection backward elimination and the stepwise selection okay so today we will be talking about you know this all possible regression and how to evaluate so many I mean two to the power of k minus one regression equation based on these criterias well now if the number of regressors is four so usually we denote the number of regressors by k minus one so if k minus one is equal to four then k basically you know k denotes the number of k denotes the number of unknown parameters in the model well so if if there are k minus one regressors that means there will be k minus one regression coefficients and there is another unknown parameter which is the intercept so total you will have k unknown parameters well so if there are four regressors in the problem then there are two to the power of four sorry two to the power of four which is equal to sixteen possible regression equations and let me just I have those 16s you know regression equation so here I am considering the problem with with four regressor variable so this is the model which without any I mean with no regressor variable so number of such model is four c zero which is equal to one now these are the models involving one regressor variable so this one is involving x one the second equation is involving x two x three and x four so these are the four regression models involving one regressor variable and then we have you know six regression model involving two regressor two regressor variables so this one is involving x one x two x one x three x one x four x two x three x two x four x three x four and then next we have regression model involving three regressor variables so there are four c three that is equal to four such models regression model so this one is involving x one x two x three like that and this is basically the full model this involves all the four regressor variables so there are four c four that means one such model okay so when the when the number of regressors variable is four we have you know sixteen possible regressor regression models and we need to evaluate them with respect to some criteria and see the complexity of this this approach you know if you have a problem with say k minus one equal to ten that means the number of regressors is equal to ten then then there are you know two to the power of ten which is equal to one zero two four possible regression equations so clearly you know the number of equations or the number of regression models that need to be fitted you know that increases rapidly with with the number of regressor variables well so I mean but but still you know since in I mean most of the practical problems the number of regressor variable could be like twenty to thirty so but of course you can you can use computer to to fit all possible to do the power of twenty models also there is no problem well so next I will be talking about the criteria the first criteria I mentioned that criteria for evaluating subset regression model well so we need to evaluate those those subset models and the first criteria to evaluate them is I mean one criteria is coefficient of multiple determination and we denote this one by r square so before also I told I mentioned about this r square and we used to call it like coefficient of determination and hence since we are talking about you know multiple linear regression model here we call it multiple coefficient of multiple determination so we denote this by r p square well so this let r p square denote the coefficient of multiple determination for a subset regression model with p minus one regressors and intercept beta naught well so by r p square you know this p basically stands for the number of unknown parameters in the model so since there are p minus one regressors there will be p minus one coefficient and the intercept beta naught total number of unknown parameters is equal to p and and we denote the corresponding coefficient of multiple determination by r p square so this r p square is equal to s s regression p by s s t which can be written as one minus s s residual p by s s t right so what is this s s regression p and s s residual p they denote regression s s and residual s s for subset model with p minus one regressors okay and so basically the r p square is associated with the model when there are p minus one regressors in the model and r p square is a parameter which measures the proportion of variability in the response variable which is explained by the regression model involving p minus one regressors well so it is not I mean like you know r r square we know that r square increases as yeah when one observation you can make you know that that r this r p square this r p square it increases as p increases because you look at the definition of r p square r p square is equal to 1 minus s residual p by s s t and we know that s s residual this decreases this decreases as p increases so from here you know you can you can easily observe the observe that that r p square increases as as p increases and and this is maximum when p equal to k because you know p equal to k means p minus one is equal to k minus one that means we are talking about the full model and since we can have you know the problem we have maximum k minus one regressor variable and the s s residual it decreases as as the number of regressor variables increases so maximum number of regressor variable possible is k minus one so when this p is equal to k s s residual has the minimum value and hence the hence r p square has you know will have the maximum value okay so what we do here is that we we compute this the value of r p square so basically first we compute r one square this r one square is the case when the number of regressors is equal to zero so r one square means this will have so p equal to one that means p minus one equal to zero the number of regressors in the model is equal to zero that is the model if you consider the model y equal to beta naught plus epsilon so this is the model you know involving no regressor variable and it is not difficult to observe you know prove that when you have this model with no regressor variable then the coefficient of multiple determination is going to be equal to zero okay next we will be computing r two square given a set of data okay so r two square you know basically here p minus one so this is p so p minus one is equal to one so this one is r two square is associated with the model y equal to beta naught plus beta one x one plus epsilon so this is r two square is for the model with one regressor okay to illustrate all these things first I will consider one example well so this is quite famous data this is called the Hald cement data here we have one regress one response variable y and we have four regressor variable x one x two x three and x four and we have thirteen observations corresponds to the response variable and the regressor variable well now you know here we have four regressor variables and you may think that all the four regressor variables are not significant to explain the variability in y some of them might be you know irrelevant and whether that the whether some variables can be removed from the model without affecting the model predictive power well so for that you know we need to we need to select the regressor variables which regressor variables are best to explain the variability in the response variable y so that is the whole purpose of this lecture let me you know let me explain the all possible regression situation here using this example so there are four there are four regressor variables so these are the possible models these are the possible models with one regressor these are the possible models with two regressors and these are the possible models with three regressor variables and this is the model with four regressor variables now what we need to do is that we need to fit each of them and once you have the fitted equation or fitted model for for this type of you know for involving x1 you can compute the SS residual SS total and from there you can compute the coefficient of multiple determination let me you know fit at least one at least one equation for example you know I will I will fit this equation so I have this data I will try to fit a model between model of the form y equal to beta naught plus beta 1 x1 plus epsilon right so I will try to fit a model of the form y equal to beta naught plus beta 1 x1 plus epsilon for that you know that whole cement data I am not going into the detail of so this looks like a simple linear regression model so you know how to find beta naught hat so you consider only the data corresponds to the response variable and the data corresponds to the first regressor variable x1 and you know how to how to fit this model fitting this model means you know you have to find the least square estimate of beta naught and beta one hat so the fitted equation you can check that fitted equation is y hat equal to 81.5 plus 1.87 x1 so this is the fitted equation so once you have the fitted equation you can compute the residuals E i and once you have no E 1 E 2 up to E 13 you can compute the SS residual so SS residual is going to be E i square from 1 to 13 you you just check you know this is equal to 1 to 65 well so you have the fitted value E i and you have the original observation so from there you can get E which is equal to y minus y hat so we know all these things and the SS total is equal to for this data it is 2715.8 and hence the SS regression is equal to 1450 so I am just trying to give you some idea you know given a problem with 4 regressors or 5 regressors how to apply this all possible regression approach now we can have the ANOVA table for this ANOVA table so the ANOVA table for this problem I mean for this model is you know you write the source of variation degree of freedom SS MS and the F statistic variation due to the regression model so total variation the part remained unexplained residual the total degree total degree of freedom here is equal to 12 because there are 13 observations now the SS residual you know here you have two unknown parameters so basically you will be getting two normal equations and that means there are there are two constraints there are two constraints on the on the residuals so the residual degree of freedom is equal to 11 sorry is equal to 13 minus 2 because of the two unknown parameters in the model so the SS residual has degree of freedom 11 and the regression degree of freedom is equal to 1 and we have the SS regression value is 1450.1 residual is 1265 and the total is 2715 right and the MS value is 1450.1 and the MS value here is you know this is 115.1 and the F value is equal to 12.6 and the total is 12.6 well so what you want to say here is that once you so this unavoidable is associated with the model y equal to beta naught plus beta 1 x 1 plus epsilon similarly you have to fit the other four models involving one regressor variable that means y equal to beta naught plus beta 2 x 2 plus epsilon so for that model you will get another unavoidable similarly you fit y equal to beta naught plus beta 3 x 3 plus epsilon you will get another unavoidable y equal to beta naught plus beta 4 x 4 plus epsilon you will have the unavoidable associated with that model so basically you know there are there will be 16 possible regression models and for each of them you will have you have to fit the model you have to find out the associated unavoidable for your convenience of course you can use you know computer or some software package like SAS or S plus to do this job and then once you have you know all this unavoidables or the SS residual value SST value for every model you can you can compute the coefficient of multiple determinations so here the coefficient of multiple determination r square and this is 2 here p is equal to 2 because there are 2 unknown parameters and here r square is equal to r 2 square is equal to SS regression which is equal to 1 4 5 0 by 1 2 6 5 sorry by SST which is equal to 2 7 1 5 this is equal to 53.4 percent so here you know this this model is not that good because because it explain the model involving the regressor variable x 1 only this explains only 53 percent of the total variability in the response variable well so what I want to say now that look at this table here now we have computed the coefficient of multiple determination for this model that is 53.4 similarly you fit this model this model is also involving 1 regressor variable and that is x 2 you find out the corresponding unavoidable and then then you compute r square value okay so this is the r square coefficient of determination associated with this model and similarly you do for all the models here also you do for all the models here you can see that you know this model particularly it is a good one this one is involving x 1 and x 2 and the coefficient of multiple determination here is 97.9 percent that means which is maximum in this class so among the among the 2 variable among the regression equations involving 2 variables this one is best this is that means y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus epsilon because you know almost 98 percent of the total variability in the response variable has been explained by this model well so similarly you have to you know this is really hectic job you know here you have to estimate all the models involving 3 regressors and you compute the r square value for all the models and this is the full model which involves all the 4 regressors and the coefficient of determination is 98.2 well now what we want to do is that we want to draw a graph the number of regressor variable p or basically p is the number of unknown parameters in the model along the x axis and and maximum r p square along the y axis. I hope you know you have observed that higher the value of r p square higher the value of r p square better the model is or the higher value of r p square indicates better fit. So, what I want to mean is that out of all this 6 model which involve 2 regressor variable this one is the best out of all this 4 regressor 4 models involving 1 regressor variable this one is the best because this has a maximum this has the max this model has the maximum coefficient of determination well. So, what we do in this graph is that here all possible models with p minus 1 regressors this is the maximum coefficient are evaluated using the criteria you know coefficient of multiple determination and the one giving the greatest r p square is tabulated. So, let me take this is my p equal to 1 p equal to 2 p 3 p 4 p 5. Now, when p equal to 1 that means there is only one unknown in the model that means p equal to 1 means p minus 1 equal to 0 that means there is no regressor in the model. So, this is the model and the r p square value is equal to 0. So, here the r p square maximum r p square is equal to 0. Now, for p equal to 2 p equal to 2 means the number of regressors in the model is equal to 1. So, out of this 4 models the maximum r maximum is 67.5. So, we will we will tabulate this one 67.5. Suppose, you know this is 20 30 40 50 60 may be 20 and then 40 60 80 100. So, 67.5 you can keep it here p equal to 3 p equal to 3 means the number of regressors in the model is 2 and the maximum 1 is 97.9. So, we will plot this one 97.9. So, for 3 it is almost here. Now, for p equal to 4 that means the number of regressors in the model is equal to 3 and the maximum is 98.2. So, for 4 it is 98.2 and for p equal to 4 it is equal to 5 means there are 4 regressors in the model and the coefficient of determination value is 98.2 again. So, we will plot 98.2 here well. So, what it suggests is that the algorithm is like that you know you start with with 1 regressor and add regressors to the model up to the point where an additional variable provides only a small increase in. So, based on this stopping criteria you can this small increase means there is no specific value of this small what you what you mean by small increase. So, either you know this model with 2 variable it has coefficient of determination value 97.9 which is of close to 98 percent of the variability is explained by this 2 regressor variable. Now, if you go for the 3 variable model then this one is the best or also this one is also the having the same multiple linear regression model. So, clearly you know you do not need to go for the you do not need to go for the 4 variable model either you choose the 3 variable model which is you know beta 1 y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus beta 3 x 3 either you go for this model or you go for this model and according to the coefficient of multiple determination criteria this one is also not bad you know this is a model with with 2 regressors well. So, this this is how we we evaluate the different possible basically all possible models using some criteria. So, we talked about one criteria that is the coefficient of multiple determination and next we will move for the MS residual criteria residual mean square well. So, what we know is that SS residual P by P I mean you know this is the SS residual for the model which for the model with P minus 1 regressors when K minus 1 is the total number of regressor variables. We know that this one decreases as the number of regressor variable increases and here we are talking about MS residual which is equal to SS residual P let me denote it by P also MS residual P by N minus P N minus P is the degree of freedom for the associated model error degree of or residual degree of freedom for the associated model and here one thing you know I want to mention that you know for SS residual decreases as P increases, but this is not true for MS residual I mean MS residual may increase with P. So, reason behind this one is that what I want to say here let me let me write MS residual P which is equal to SS residual P by N minus P and also let me write MS residual P plus 1 which is equal to SS residual P plus 1 which is equal to SS residual P plus 1 by N minus P. We know that SS residual P is greater than or equal to SS residual P plus 1 because SS residual decreases as P increases, but the same thing is not true for MS residual here this could be this could be larger than the MS residual P. The reason is this you know the increase in MS residual P by N minus P occurs I mean this may be I mean larger this occurs when the reduction in SS residual P for adding a regressor to the model is not sufficient to compensate for the reduction the loss of one degree of freedom in denominator. Of course, I mean what I want to say here is that you know this one is of course smaller than this one, but if you add an irrelevant regressor in the model this will decrease, but the reduction here for adding one more regressor in the model the reduction in SS residual is if it is not sufficient to compensate you know one degree of freedom loss here then only it increases. So, if you if the newly added regressor variable is not relevant for the response variable or not relevant for the model then only you know the reduction in SS residual for adding this irrelevant irrelevant regressor to the model is not sufficient to compensate the one degree of freedom loss in the model then only MS residual increases. Well, so we learn how to how to evaluate you know all possible models using the MS residual criteria in the next class well. So, we will continue this this criteria MS residual in the next class. Thank you for your attention.