 this is my first lecture on polynomial regression model. And here is the content of this topic. So, we will be talking on polynomial models in one variables and orthogonal polynomials piecewise polynomial fitting. And also we will be talking about polynomial models in two or more variables. So, polynomial models are used in regression analysis when the response variable is non-linear. That means, given a set of data x i y i for i equal to 1 to n, first you prepare the scatter plot and when the scatter plot indicates that the relationship between the response variable and the regression variable is non-linear, then we need to go for polynomial model. So, here y equal to beta naught plus beta 1 x plus beta 2 x square plus epsilon is called second order model in one variable. So, in general kth order polynomial one variable is y equal to beta naught plus beta 1 x plus beta 2 x square plus beta k x to the power of k plus epsilon. So, now, if you put say for example, x you set x j equal to x to the power of j, then this can be rewritten as y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus beta k x k plus epsilon. So, this one is nothing but a multiple linear regression model involving k regressors. So, then kth order polynomial model in one variable becomes a multiple linear regression model with k regressors x 1 x 2 x k. So, there is I mean fitting a kth order polynomial is same as fitting a multiple linear regression model involving k regressors, but there are several important consideration while fitting a multiple linear regression model. The first one is what would be the order of the polynomial because we are talking about fitting a kth order polynomial. So, we need to decide about the order of the polynomial. So, here the suggestion is that we would like to keep the order of the polynomial as low as possible. So, when the response variable is non-linear that means, when the scatter plot indicates that there is a non-linear relationship between the response and the regressor variable. First you try for some transformation to make the model linear, if that fails then you go for a second order polynomial. So, we do not recommend a polynomial fitting of very higher degree, usually the order of the polynomial is less than or equal to 2. So, the next issue is that is called the model building strategy. So, the first one was the order of the polynomial and here we sort of decided that it is recommended that k is usually less than or equal to 2. So, second one, so this is the order of the polynomial k and the second one is model building strategy. So, this is also I mean regarding the degree of the polynomial sorry the order of the polynomial, this is called forward selection. So, what this forward selection suggests that you start with the linear model, start with linear model that means, you start with y equal to beta naught plus beta 1 x plus epsilon and then you go for the second order polynomial say y equal to beta naught plus beta 1 x plus beta 2 x square plus epsilon and after fitting this model, the second order model you need to test the significant of the highest order term that is beta 2 here. If beta 2 is significant then you go for a third order model say y equal to beta naught plus beta 1 x plus beta 2 x square plus beta 2 x square plus beta 3 x square plus epsilon, but if you see that beta 2 is not significant then you can stop here. So, you will stop in the second order model. So, this is what the algorithm say in general. So, ultimately it is a successively you can fit model of increasing order until the t test for the highest order term is nonsignificant. So, this is what the model building strategy is and next another consideration that is called the yield conditioning. So, here as the order of the polynomial increases, the x prime x matrix becomes yield condition. So, what is the meaning of this? It becomes yield conditions is that the x prime x matrix becomes yield condition. It becomes near singular that means that is same as x prime x inverse calculation becomes inaccurate because we need to compute this one. So, because the estimation of regression coefficient beta hat is equal to x prime x inverse x prime y. So, we need to compute this inverse, but as the order of the polynomial increases this x prime x matrix becomes near singular. So, the computation of inverse becomes inaccurate. So, the specific case if the values of x are limited to a narrow range, there can be significant yield conditions in conditioning problem column of x. Let me give an example of this one. You must have understood that we are talking about the polynomial y equal to x prime x inverse beta naught plus beta 1 x plus beta 2 x square and beta k x to the power of k plus epsilon. So, here is the coefficient matrix x is the first column is 1, the second column is corresponds to x values, the third column is corresponds to x square values like this. So, if you have say the x value say very limited to a narrow range suppose the x values are like 0.11, 0.2 0.13 these are anyway 1. Then the x square value is 0.0 121, 0.0 144, 0.0 169. So, here you can see, well let me write then the x square column this is approximately 0.4, 0.4, 0.4, 0.4, 0.4, 0.6, 9. So, here you can see 0.0 equal to 0.01 times x column. So, here you can see there is a near dependency between these two columns. So, that means the matrix become near singular. So, that is why it says that if the values of x are limited to a narrow range, if the x values are from the narrow range there could be significant ill conditioning problem in the column of x. And how to remove this ill conditioning problem is that no one way to do is that centering the data may remove ill conditioning. That means we fit the model say y equal to beta naught plus beta 1 x minus x bar plus beta 1 x minus x bar plus beta 1 x minus x bar plus beta 2 x minus x bar whole square plus epsilon. You fit this model, but the data are centred instead of y equal to beta naught plus beta 1 x plus beta 2 x square plus epsilon. So, this is one way to remove ill conditioning problem. So, next we will talk about orthogonal polynomial. Suppose we wish to fit the model y equal to beta naught plus beta 1 x plus beta 2 x square beta k x to the power of k plus epsilon. And here you have observed that the coefficient matrix x is sort of 1 x x square up to x to the power of k. So, now if we wish to add the value of x bar add another term like beta k plus 1 x to the power of k plus 1, then we must recompute x prime x inverse. Because once you add this term in the polynomial, you have to add one more column x to the power of k plus 1. So, you have to recompute the new x prime x inverse and also estimates of lower order parameters beta naught hat, beta 1 hat, beta k hat. This thing will change once you add the value of 1 higher order term in the polynomial model. So, how to that means you have suppose you start with the second order model and then you compute beta naught, beta 1, beta 2. Now, if you want to add say third order term like beta 3 x to the power of 3 in the model, then again you have to recompute x prime x inverse and the lower order parameters also. So, how to avoid this problem. So, one way to do this is to use orthogonal polynomial. So, here so if we construct polynomials plus p naught x. So, p naught x is a polynomial of degree z order 0 p 1 x of order 1 p k x with the property that they are orthogonal polynomial. That means summation p r x i into say p s x i is equal to 0 for i equal to 1 to n for r is not equal to s and r s they are from 1 to up to k. So, if you can find polynomial like this you know they are called orthogonal polynomial. Then we can rewrite the model as y i equal to alpha naught p naught x i plus alpha 1 p 1 x. So, this is orthogonal polynomial of order or degree 1 plus alpha k p k p k x i. So, we are replacing x by p 1 x and x to the power of k by p k x i. So, that means this is a polynomial orthogonal polynomial of degree or order k plus epsilon i for i equal to 1 to n. So, where p r x i is a r th order orthogonal polynomial. So, instead of fitting the model alpha naught plus alpha 1 x plus alpha 2 x square plus alpha k x to the power of k plus epsilon we are fitting a model p naught sorry alpha naught p naught x i alpha 1 p 1 x i plus alpha k p k x i and these are the equivalent problem and these are the equivalent problem. Let me just before we you know learn how to compute or how to estimate this regression coefficients let me give example of orthogonal polynomial to make this I mean to get better idea about these polynomials orthogonal polynomials. So, here is the example of orthogonal polynomial here the condition is that the x values are x values are equally spaced. So, here the 0 th order polynomial p naught x i is equal to 1 p 1 x i is equal to 1 p 1 x i is equal to lambda 1 x i minus x bar by d I will explain why their orthogonal polynomial p 2 x i is equal to lambda 2 x i minus x bar by d minus n square minus 1 by 12. So, this is of order 2. So, this is the second order orthogonal polynomial and then p 3 x i equal to lambda 3 x i minus x bar by d to the power of 3 minus x i minus x bar by d into 3 n square minus 7 by 12 and let me write one more p 4 x i is equal to x i minus x bar by d to the power of 4 minus x i minus x bar by d square 3 n square by 3 n square minus 13 by 14 plus 3 n square minus 1 n square minus 9 by 562. I am sorry you do not need to remember all these things. So, given a problem you will be given the orthogonal polynomials. You do not need to memorize this thing lambda 4. Let me define some terms here. I have used where d is the spacing between the levels of x and lambda j are equal to chosen so that the polynomial will have integer values. These are the orthogonal polynomials. Let me just give what I mean by d and lambda 1 says for p 1. Suppose you are given a data with n equal to say 8. You are given 8 observations and you want to find the orthogonal polynomials for that observation and it does not matter what are the values of x because you know that these x values are equally spaced. So, you can say the x values are just like 1, 2, 3, 4, 5, 6, 7 and 8 because there are 8 observations. So, here the d is the spacing between the levels of x. So, here d is equal to in this example d is equal to 1 and the x bar is of course for this particular case x bar is equal to 4.5. We can check that. Then what is p 1 x 1 p 1 x 1 is 1 minus 4.5 by 1 into lambda 1. So, this is minus 3.5 and it says that lambda are chosen so that the polynomial will have integer value. So, to make it integer value you take lambda 1 is equal to 2. So, 2 into this is minus 4.5 by 1 into lambda 1. So, this is minus 3.5 minus 7. Similarly, if you put 2 here x equal to 2 you will get minus 5. So, this is what my p 1 x and if you put 3 here you will get minus 3. If you put 4 here then it is minus 1. If you put 5 then it is 1, 3, 5, 7. So, this is minus 4. So, this is minus 4. So, this is how you have to for different n you will have different orthogonal the values will be different. I mean the same orthogonal polynomial of course. So, you can compute p 2 x p 3 x all these things. So, you know what is d looking at the value of this you can decide about lambda 2. So, my aim is not to talk more about this orthogonal polynomial. What I wanted to do is that I had a model like I started with the model y equal to beta naught plus beta 1 x plus beta 2 x square plus beta k x to the power of k plus epsilon and then there is some problem some considerations with this model. Instead of fitting this model I wanted to I want to fit the model y equal to alpha naught p naught x plus alpha 2 p 2 x sorry p 1 sorry plus alpha 1 p 1 x plus alpha 2 p 2 x plus alpha k p 2 x plus k x plus epsilon. So, I want to fit this I want to find the value of alpha naught alpha 1 alpha 2 alpha k. So, how do I do that? I this is a multiple linear regression model I will write down what is my x the coefficient matrix is 1 1 1 the first column and the second column is because p naught x is equal to 1 for all x. Now, p 1 x this is my p 1 x 1 the first observation p 1 x 2 for the second observation and p 1 x n for the second observation and similarly, if my third column would be p 2 x 1 p 2 x 2 and then p 2 x n and my k th column that is p k x 1 p k x 2 p k x 1 p k x 2 p k x 1 p k x 2 p k x 2 p k and p k x n. So, this is my x matrix. Now, we will realize the advantage of this orthogonal polynomials and these are orthogonal polynomials right. So, then what is x prime x? What is x prime x? x prime x is n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 and then the second row is see this is nothing but my p naught x that is p naught x 1 p naught x 2 p naught x n. So, this column into this column is since they are orthogonal that is why this term is equal to 0 and the second diagonal element is p 1 is x i square, of course, 1 to n and all other elements. So, it is become a diagonal matrix, right. The last one is p k x i square 1 to n. So, this is my x prime x matrix, which is a diagonal matrix and I can write down this one as matrix from y equal to x alpha plus epsilon. So, alpha hat, so the least square estimate of alpha hat is equal to x prime x inverse x prime y. You know x prime x, you know of course, y, y is nothing but y 1, y 2, y n. So, you can compute alpha hat. So, let me write down what is alpha naught hat? Alpha naught hat is first you compute x prime y. So, x prime y, that is summation y, summation y into x prime x inverse that means 1 by n. So, that is nothing but y bar and similarly for other parameters say alpha j hat is equal to you can check that you take the j th column here and then that is p j x i into y i and here the j th diagonal element 1 by just p j square x i. I am sure you understand this. So, this is for j equal to 1 to up to k. So, this is how you can estimate the regression coefficients. Now, here the advantage is that you should observe this one. Now, if you add say one more term here say alpha k plus 1 p k x plus 1 these things does not change. So, you do not need to recompute x prime x and the value of the lower parameters also does not also they do not change. So, this is the advantage of using orthogonal polynomial. Now, let me write down the anovatable for this model here for this fit what is the residual sum of square residual sum of square that is called SS residual. So, SS residual we know that this is nothing but E i square E i is the ith residual for i equal to 1 to n and this one is again nothing but y i observed value by minus the estimated value whole square 1 to n. Now, we can write this one in terms of matrix form y minus y hat prime into y minus y hat right. So, this one is same as y prime y minus y prime x alpha hat. So, this you can check you know why this one is equal to this from your from the second topic on multiple linear regression. So, you talked about this one before. Now, y prime y is nothing but y i square and then you first compute y prime x and that is nothing but I mean you can check that the whole thing is y i p j x i that is the jth element in x prime y rho it is a rho now and then why you multiply with this vector alpha hat this become alpha j hat sum over j equal to 0 to k it is not difficult to check this one you just write down the matrix and check this. Now, this one is equal to from i equal to 1 to n i equal to 1 to n y i square. Now, the 0th for j equal to 0 I will separate it out that is alpha 0 hat and for j equal to 0 p 0 is 1. So, this one is nothing but summation y i and I will keep the other terms from j equal to 1 to k here alpha j hat minus y i p j x i right. Now, you know that this alpha not hat alpha not hat this is nothing but y bar then this one is summation y i square minus n y bar square minus j equal to 1 to k alpha j hat sum over this is i equal to 1 to n y i p j x i i equal to 1 to n. Now, you know that you know this thing is nothing but SST. So, SST minus j equal to 1 to k alpha j hat i equal to 1 to n y i p j x i. So, SST residual is equal to SST minus something and this one is nothing but SST regression right. So, regression sum of square we can write this is nothing but this part the second term is SST regression. So, the regression sum of square is equal to alpha j hat summation y i p j x i i equal to 1 to n j equal to 1 to k. Now, what you want to say here is that. So, this is the total I mean regression sum of square that is SST this one is nothing but SST regression. Now, what is SST regression due to the j th term that the notation for that is SST regression due to the due to alpha j the j th term that j th term is alpha j p j x i that is nothing but the j th term here that is nothing but alpha j hat summation y i p j x i i equal to 1 to n. Similarly, for SST regression due to alpha 1 is you just replace this j by 1. So, you will get SST regression due to every regression coefficient separately. And here it is very important that you know and also useful that all sum of square for the coefficient say alpha 1 alpha 2 alpha k they are orthogonal and their value their values do not depend on the order of the polynomial. So, if you have say 2 degree polynomial then the SST regression due to alpha 1 and the SST regression due to alpha 2 you have. And now say you make this polynomial to say 5 degree polynomial then the there you will again have you know SST regression due to every regression coefficient alpha 1 alpha 2 alpha 3 alpha 4 and alpha 5. But this alpha 1 and alpha 2 they do they do not change they remain the same even if you go for the higher model. So, this is in the beauty of this orthogonal polynomial. Now, let me just write down the ANOVA table for this one. So, ANOVA table. So, source degree of freedom sum of square m s and finally F. So, the sources are. So, SST regression again SST regression due to alpha 1 SST regression due to alpha 2 and similarly SST regression due to the k th term. And you also have the total variation in the response variable that is SST sorry I should write just total and the part which is not explained by this regression model or this terms is called residual. Well, now total degree of freedom we know that SST total is y i minus y bar whole square that is the variation in response variable. And this has degree of freedom n minus 1 because of the constant that they satisfy a constant that y i minus y bar is equal to 0. So, you know that. So, the degree of freedom is n minus 1. Now alpha 1 has degree of freedom 1 alpha 2 1 I hope you understand all this thing. So, you have k coefficients and the residual degree of freedom is n minus k minus 1. So, other way to explain this is that other way to explain this degree of freedom is that there are n observations. So, n residuals, but there are k plus 1 there are k plus 1 constraint on the residual because there are k plus 1 coefficients like including alpha 0 alpha 0 alpha 1 and alpha k. So, there will be k plus 1 constraint on residual. So, the residual degree of freedom is n minus k minus 1. So, this one is S S regression due to alpha 1 S S regression due to alpha 2 S S regression due to alpha k and you know all these things. So, you know what is this S S regression alpha 1 that is nothing, but you put just j equal to 1 here to get that. So, we know how to compute S S regression due to the coefficients and we know what is S S residual that is S S t minus S S regression. So, that is called S S residual and of course, the M S value are same as S S because the degree of freedom is equal is there 1. So, M S regression due to alpha 1 is same as S S regression due to alpha 1 and that is by 1. So, that is same thing. Now, only the M S residual is equal to S S residual by n minus k minus 1 and the F value. Suppose, you want to test the significance of say the highest order term alpha k. So, the test statistic for that is F equal to M S regression due to alpha k by M S residual and this F follows F 1 n minus k minus 1. So, this is the ANOVA table for this one. Now, see in model building strategy I told that you know you start from lower order model say first order model and then whether you need second order model to test that you test the significance of alpha 2 at the highest order term. So, similarly here the significance of of highest order term to check that that is alpha k to check that you have to test the hypothesis alpha k equal to 0 against alpha k not equal to 0 and you know the test statistics F is equal to M S regression due to alpha k by M S residual and this follows F 1 n minus k minus 1 and hence the critical region is F greater than F alpha 1 n minus k minus 1. So, if the observed F is greater than this tabulated F then we reject the null hypothesis that means the kth order term is significant. So, if the kth order term is significant you can consider the kth degree polynomial and then you have to check for k plus 1 F degree and if you see the k plus 1 th degree polynomial is not significant then you stop there otherwise if it is significant again you have to go for the higher order polynomial. So, in the next class I will give an example to illustrate this orthogonal polynomial today we have to stop now. Thank you.