 this is my first lecture in multiple linear regression and the content of today's lecture is estimation of model parameters in multiple linear regression and properties of least square estimators and then we will be talking about once the model has been fitted we will be talking about testing for significance of regression okay so let me recall the Disney toy problem there we had only one regressor variable that is the amount of money spent on advertisement well we have observed that the regressor variable there that means the amount of money spent on advertisement that explained 80% of the total variability in response variable that is the variability in sales amount and the 20% of the variability in the response variable that remained unexplained so that we say that is you know the SS residual part now there could be one more regressor variable which can explain the part of that unexplained variability in response variable that means the part of that 20% of the variability which remain unexplained in that case and one important regressor variable could be you know the number of sales person you employ okay so also in the in most of the cases in practice there you will have more than one regressor variable and in that case we need to move for multiple linear regression let me explain the multiple linear I mean multiple linear regression model well okay so the situation here is that instead of one regressor here we have more than one regressor variable say we have k-1 regressor variables okay and the general form of multiple linear regression is y i equal to beta naught plus beta 1 x i 1 so this one is the first regressor variable plus beta 2 x i 2 up to beta k-1 x i k-1 plus the error part epsilon and this model is you know basically it is for the for the ith observation so I runs from 1 to n okay so since we have you know more than one regressor variable then that is why we call it multiple regression and since the model is linear that is why it is called multiple linear regression but one should be careful you know this is a linear function I mean linear means it is a linear function of the unknown parameters here the unknown parameters are beta naught beta 1 beta 2 and beta there are k unknown parameters so this one is this model is is linear linear of unknown parameters beta naught beta 1 up to beta k-1 so it is not I mean if the if the model is linear in unknown parameter then then only it is called you know linear model well and we make the assumption that that the error this is the ith error which follows normal distribution with mean 0 and the variance sigma square and they are also independent all the epsilon i's are independent okay so now we will define some matrices y equal to y 1 y 2 y n these are the observations n observations beta equal to beta naught beta 1 beta k minus 1 so this is a k cross 1 vector this is the vector of this is the vector of parameters and this is the vector of observation and epsilon equal to epsilon 1 epsilon 2 up to epsilon n so this one is the vector of errors and also we define n cross k matrix which is equal to x that is 1 x 1 1 x 1 2 x 1 k minus 1 so these are basically observation the first observation on this is the observation first observation on regressor 1 this is the first observation on regressor 2 this is the first observation on regressor k minus 1 okay so 1 x 2 1 x 2 2 x 2 k minus 1 and similarly 1 x n 1 x n 2 this is corresponds to the nth observation x n k minus 1 okay so this is a you know it is a matrix of known form because all the values are known well we have we have the data like we have the data of this form y i x i 1 x i 2 x i k minus 1 so we have this data for i equal to 1 2 n and we have to using this set of observations we have to fit a model like this is a multiple linear regression model and this model can be now using the matrix notation this model this model can be expressed as you know y equal to x beta plus the epsilon so this is the vector of observations vector of parameters vector of errors well so this is the model we have to fit and this is the model in matrix form we we are given the data we are given a data of this form and using this data we have to find I mean we have to we have to fit the model that means basically we have to estimate the parameters well now we will be talking about the estimation model parameters I mean in the in the multiple linear regression there is almost no new concept every all the concept we have already talked about in the simple linear regression so like simple linear regression here also the you know estimating we will be estimating the parameters using least square method so the parameters are determined by minimizing the SS residual well so least square method determines the parameters so here instead of you know only beta naught and beta 1 we have basically k unknown parameters that is the only difference so least square method determines the parameter by by minimizing by minimizing SS residual SS residual so what is SS residual SS residual is basically it is E I square from 1 to N which is again nothing but Y I minus Y I hat square 1 to N right now suppose my fitted model is beta naught hat plus beta 1 hat x 1 plus beta k minus 1 hat x k minus 1 so this quantity is equal to SS residual is equal to Y I minus beta naught hat minus beta 1 hat x I 1 because you know I am talking about the ith fitted value beta 2 hat x I 2 like this beta k minus 1 hat x I k minus 1 whole square now you know the least square method determines the parameter by minimizing this SS residual what we will do here is that I mean we will also represent this SS residual in matrix form for that we will define the residual vector E which one is basically E 1 E 2 E N so E I is the ith residual so E can be written as E is Y minus Y hat so this Y is the vector I mean vector of observations and the vector of observations for the fitted value well so this is my E now SS residual is equal to summation E I square 1 to N so we are basically you know talking about another I mean how to express this thing in in terms of matrix notation so this can be written as E prime E right if you yeah now this one is equal to equal to Y minus Y hat prime Y minus Y hat okay and this can be written as Y prime Y minus Y prime X beta hat minus beta hat prime X prime Y plus beta hat prime X prime X beta hat okay I just missed one step in between basically you know I am replacing Y hat by I am Y hat by this expression so in matrix notation this is nothing but Y hat equal to X beta hat okay so you replace Y hat by Y minus X beta hat prime Y minus so and then you have this expression here you know this is you can check that this is one cross one matrix that means it is a scalar quantity similarly this one is also one cross one matrix so basically it is a scalar so everything is scalar here so these two quantity these two are same so this can be written as Y prime Y minus two times I am taking this form beta prime X prime Y plus beta hat prime X prime X okay so this is my SS residual in matrix form but if you do not understand this one here is your SS residual here is your SS residual which is very similar to the simple linear regression only we have this additional terms because of the additional regressors variable and the same thing is represented here in matrix form well so we have two different representation of the SS residual and now we need to differentiate this SS residual with respect to the unknown parameters so there are you know there are k unknown parameters so we have to differentiate this SS residual with respect to each unknown parameter and that you that will give you k normal equations so then you will be having k normal equations and k unknown so using this k normal independent normal equations you can find out you can get the estimator for the unknown parameters k unknown parameters okay so here is the process you know l least square method well so what we have is that we have two I am explaining both the things if you do not understand the matrix representation so here the SS residual is of the form Y prime Y minus Y prime two times beta hat prime X prime Y plus beta hat prime X prime X so this is the matrix representation of the SS residual and another way to represent the same thing is that like the usual technique SS residual is equal to summation y i minus beta naught hat minus beta 1 hat X i 1 and similarly you go up to beta k minus 1 hat X k minus 1 sorry you have to put a i here that is all so this is my SS residual okay now what I will do is that I will differentiate to get the normal equations I will differentiate this SS residual with respect to beta naught first so SS what I have to do is that I will differentiate this SS residual with respect to beta naught hat and equal to 0 this is the normal equation which implies or which gives you know you differentiate this with respect to beta naught hat that will give you summation y i minus beta naught hat minus beta 1 hat X i 1 minus beta k minus 1 hat X i k minus 1 equal to 0 so this is my first normal equation and you know this term is nothing but E i okay y i minus y i hat so this can be also written this first normal equation can be also written as summation E i i from 1 to n equal to 0 so this is my first normal equation similarly next you differentiate this quantity I mean you differentiate SS residual with respect to beta 1 hat that will give you the normal equation summation E i X i 1 equal to 0 so this is a very similar to the simple linear regression and similarly you differentiate with respect to beta 2 hat and you go up to beta k minus 1 hat and the final normal equation is summation E i X i k minus 1 equal to 0 so here you have you have k normal equations and you have you have k unknown parameters and all these normal equations are independent so solving this solving this k normal equations will will give you the k n k unknown parameters beta naught beta 1 and up to beta k minus 1 okay so this is the usual take I mean form what we have used in the case of simple linear regression now I mean you know I will go for the matrix representation of the same thing what I will do is that I will just differentiate the SS residual which has been explained you know you know which has been expressed in terms of matrix notation and I will differentiate that with respect to beta hat well so this is my SS residual with respect to in terms of or this is presented in matrix notation now let me differentiate this one SS residual with respect to beta hat okay differentiate with respect to beta hat equal to 0 which implies or which gives you know you differentiate this with respect to beta hat that will give you minus 2 X prime Y so I am differentiating so see this is independent of beta so while differentiating this term it is 0 now you differentiate the second term so that will give you minus 2 X prime Y when you differentiate the third term that will give you plus 2 times X prime X beta hat and you equate this equal to 0 okay I mean you can you can write down the matrix form in detail and you differentiate will get this one okay so from here you know here it is convenient to since the same thing written here and here now finding beta hat in this matrix representation is easy from this normal equation so this is in fact you know it consists of k normal equations this k normal equations so from here we get this implies beta hat is equal to X prime X inverse X prime Y okay so here is the least square estimator of the unknown parameters beta naught beta 1 up to beta k minus 1 okay and if you solve this k normal equations you will be getting the same thing okay now we will be talking about the statistical property of this least square estimator so least square so what I am going to do is that I am just going to prove that whatever estimator we have obtained that means beta hat which is equal to X prime X inverse X prime Y this is an unbiased estimator of beta let me prove that unbiased means we have to prove that expectation of beta hat is equal to beta so expectation of this one is expectation of X prime X inverse X prime Y now what is Y Y is in in matrix notation equal to X beta plus epsilon so this one is basically equal to E X prime X inverse X prime X beta plus epsilon right so this can be written as expectation of X prime X inverse X prime X beta plus expectation of X prime X inverse X prime epsilon okay well so this quantity this is going to be identity so expectation of and this one is equal to beta only plus the expectation of this term or this random variable here you know this epsilon is a random variable which follows normal distribution with expectation 0 and variance sigma square so the expectation of epsilon we know that expectation of epsilon is equal to 0 so that is equal to 0 which means this is equal to beta so we have proved that expectation of beta hat is equal to beta that means beta hat the estimator the least square estimator we have obtained that is an unbiased estimator of beta okay so next we are going to derive the variance of this estimator so the variance of beta hat is equal to the variance of X prime X inverse X prime Y right so this one is going to be we know the variance of Y is equal to sigma square well variance covariance matrix this Y is basically it is a vector and observation vector so the variance of the whole thing and this one is independent of I mean this constant term it does not involve any random variable X is not a random variable so this one is going to be X prime X inverse X prime I sigma square into X X prime X inverse okay so this one is you know this can be finally written as sigma square into X prime X inverse because X prime X and X prime X inverse will cancel out so it is sigma square into X prime X inverse well so next we will be talking about the different representation of s s residual s s residual in matrix notation this is we observed we derived that this is equal to Y prime Y minus 2 beta hat prime X prime Y plus beta hat prime X prime X beta hat right now you know we know that this beta hat is equal to X prime X inverse X prime Y so I am going to put this value here just to simplify this expression this is equal to Y prime Y minus 2 times beta hat prime X prime Y plus beta hat prime X prime X and now I will plug this beta hat here this is going to be X prime X inverse X prime Y so just I have replaced this beta hat by this expression so this quantity is now Y prime Y minus 2 beta hat prime X prime Y plus beta hat prime X prime Y because this is identity well so the simplified form is Y prime Y minus beta hat prime X prime well so this the same thing you know this one is nothing but summation EI square and here is the matrix representation of this summation EI square well what we know is that we know that this summation EI follows normal distribution with mean zero and variance sigma square now let me talk about what is the degree of freedom of this SS residual well I equal to 1 to n so we know that the sum a SS residual is summation EI square from 1 to n and and EI's follows normal with mean zero and variance sigma square now I want to talk about the degree of freedom for this SS residual SS residual is sum of n EI square but just now we have derived that you know this EI's they satisfy k constant that means there are k normal equations involving EI so here all the EI's are I mean you do not have the freedom of choosing all the EI's n EI's independently you can choose n minus k of them you have the freedom of choosing n minus 1 n minus k of the n EI's and the remaining k have to be chosen in such a way that they satisfy those k constants well so in the case of simple linear regression we had two constants on EI that is why you had the freedom of choosing n minus 2 EI's independently and then the remaining two we had chosen in such a way that they satisfy the constant those two constant and here instead of two constant on EI we have we have basically n constant and here here are the this n constant you know sorry we have k constant and these are the k constant we have so you cannot this EI square here I mean you cannot choose n of them you do not have the freedom of choosing all the n EI's you can you can choose you have the freedom of choosing n minus k EI's independently and then the remaining k have to be chosen in such a way that they satisfy this k constant so basically you are losing k degree of freedom because of this k k constant on on on the residuals well so that so that is that that is explained that the SS residual here SS residual has n minus k degree of freedom. Now we know that this this follows this then from here you can say that EI square by sigma square follows chi square 1 and and from here you can say that SS residual SS residual by sigma square which is nothing but summation EI square by sigma square this follows 1 to n this follows chi square n minus k not n because of those k constant well and then well so this we have this result and also we can define the mean square residual mean square that is MS residual which is obtained by dividing the SS residual by the degree of freedom n minus k so and and we know that it it is not difficult to prove that this MS residual is an unbiased estimator of sigma square that means we can we can it is easy to prove that expected value of MS residual is equal to sigma square so we have an unbiased estimator for sigma square as well. So before moving to moving to the statistical significance of the regression model I just want to give another representation of SS residual so the SS residual can be represented in several ways you know just simply you can write summation EI square I equal to 1 to n then we had the matrix representation of SS residual and now I am going to give another representation of the SS residual which is in terms of the hat matrix right now I do not have any use of this expression but in future may be we will be using this expression. Let me give another just another representation of the SS residual using the hat matrix so I will say that this is other way way to express express SS so well what we know is that we know that E equal to in in matrix notation E equal to y minus y hat so this y is basically the observation vector and this one is going to be y minus what is y hat y hat is nothing but x beta hat right now this one is going to be y minus x now I will replace this beta hat by its estimator x prime x inverse x prime y right so what I got is that this is equal to i minus x x prime x inverse x prime right so this one is using the notation of H matrix this is i minus H into y so this is an n cross n matrix the n cross n matrix H is equal to x which is equal to x into x prime x inverse x prime is called the called the hat matrix yeah this is called the hat matrix because you know ultimately what we had is that here it is equal to y minus H y so this is called hat matrix because it this H matrix transform y 2 so this one is H y is nothing but y hat so this matrix transform y 2 y hat that is why it is called hat matrix and now you know you can you can prove that you know H H square equal to H well so this is the specialty of this matrix now SS residual can be written as SS residual is equal to E prime E which is equal to y prime i minus H prime i minus H and you can you can check that this i minus H prime i minus H is nothing but i minus H so this can be written as y prime i minus H y so this this is you know another way to express the SS residual right and as I said that I am not going to use this expression of SS residual in terms of hat matrix at this moment I will be using in future well next I will be moving to the sort of you know ANOVA approach to test the H y the statistical significance of the regression model for that I will be preparing with I will I will talk first I will talk about SS total and then the SS regression well so what is SS total here this is SS total is nothing but the variation in the observation or variation in the data which is nothing but y i minus y bar whole square i equal to 1 to n so we have n observations of the form y i and then x i 1 and then x i k minus 1 and then x i so this SS t is nothing but the variation in the response variable well so this can be written as summation y i square minus n y bar square so this is this is not difficult to check well what is the degree of freedom of this SS total SS total is nothing but has degree of freedom yeah it is sum of n terms but of course it it satisfy the constant it satisfy the constant that y i minus y bar this is equal to 0 so you do not have the freedom to choose all the terms I mean y 1 minus y bar y 2 minus y bar up to y n minus y bar so you can choose n minus you have the freedom of choosing n minus 1 of them and then the n th one has to be chosen in such a way that that it satisfy this constant so SS t has degree of freedom n minus 1 now what is SS regression SS regression is equal to SS total minus SS residual right. Well so SS total is equal to you know that this is equal to summation y i square 1 to n minus n y bar square minus SS residual SS residual if you can recall it is y prime y minus beta hat prime x prime y in matrix notation now I can also you know slowly I mean this can be replaced in I mean this can be also written as y prime y minus n y bar square so this y bar is nothing but the mean of the observations minus y prime y plus beta hat prime x prime y so this 2 will cancel out and you are left with beta hat prime x prime y minus n y bar square so we have the expression for SS regression we have the expression for SS total we have the expression for SS regression and just we are left with the degree of freedom for SS regression what we know is that SS total is equal to SS regression plus SS residual well so let me say again that that this is the this is the total number of times total variability in in the response variable and that variability is partitioned into two parts one is I mean how much of the variability in the response variable is explained by the model that is SS residual and the part which is not being explained by the regression model is called the SS residual well we want to we want the model to be such that we want the model to maximize SS regression and then obviously minimizing SS residual so SS total has a degree of freedom n minus 1 we know that SS residual has the degree of freedom n minus k then the degree of freedom for SS regression is n minus sorry is equal to k minus 1. So here is the degree of freedom for so SS regression has degree of freedom well so in the next class I will be talking about the statistical significance of the regression model in case of multiple linear regression thank you very much.