 Hi, so this is my fifth tutorial class and today we will be considering problem from non-linear estimation, generalized linear model, dummy variable and also variance stabilizing transformation. So, here is the first problem from non-linear estimation. So, estimate the parameter alpha and beta in the non-linear model y equal to alpha plus 0.49 minus alpha into e to the power of minus beta x minus 8 plus epsilon. So, this is the non-linear model we need to fit that means we have to estimate the parameter alpha and beta from the following observation. So, you are given around 20 observations here and this one is non-linear model because of the fact that this is a function is it is a non-linear function of the parameters alpha and beta. So, now we will be solving this problem. So, the problem is to estimate alpha and beta of the non-linear model using the data. So, the residual sum of square can be written as you can say it is residual sum of square or may be the least square function which is alpha s alpha beta this is equal to y u this is the u th observation and corresponds to the response variable y minus f x u alpha beta sum over u and this is the least square function and we are given this is the non-linear function in alpha and beta. So, I can write this as y u minus alpha minus 0.49 minus alpha e to the power of minus beta x u minus 8. So, this is the non-linear function we are given and we have to basically estimate alpha and beta in such a way that this least square function is minimum that is that is what the least square method is. Now, since the given function f or the model is non-linear. So, all the normal equations are going to be normal and it is very difficult to solve system of non-linear equations. So, what we do is that what we have learned in the non-linear estimation is that we approximate this non-linear function by Taylor series and we approximate this non-linear function by a linear function. So, here is the Taylor series involves the derivative of this function and let me write down this is called the linearization. So, we linearize this function f x u alpha beta which is equal to alpha plus 0.49 minus alpha e to the power of minus beta x u minus 8. So, we linearize this non-linear function by Taylor series and the derivative of this function with respect to alpha is 1 minus e to the power of minus beta x u minus 8. And the derivative of this partial derivative with respect to beta is equal to minus 0.49 minus alpha e to the power of minus beta x u minus 8. Now, the Taylor series expansion of this function expansion of f alpha beta about the point alpha naught beta naught is f maybe I should write x u also here x u alpha beta. So, now we write this non-linear function I mean of course, we approximate this non-linear function by linear function using Taylor series and here is the approximation. So, this one is equal to f x u alpha naught beta naught plus d f d alpha at the point alpha naught. So, that is equal to 1 minus e to the power of minus beta beta naught at the point alpha naught beta naught. So, beta naught x u minus 8 into alpha minus alpha naught plus derivative of this function with respect to beta at the point alpha naught beta naught. So, that is minus 0.49 minus alpha alpha naught of course, into e to the power of minus beta naught x u minus 8 and then beta minus beta naught. So, now you can see that this one is linear in alpha beta. So, this is a constant term this one is also constant because we have plugged the value alpha naught beta naught. So, this is linear in alpha beta. So, we will write this in notation this one is equal to f u naught plus this partial derivative at the point alpha naught beta naught that we will write at as z 1 u naught and then alpha minus alpha naught plus z 2 u naught beta minus beta naught. So, what we did is that we wrote this non-linear function using I mean we wrote this function approximate this non-linear function by a linear function in alpha beta using Taylor series expansion. So, now we have to estimate the parameter alpha and beta for a linear function and we know how to do it using the multiple linear regression technique. Now, we can we are in the position to use ordinary least square technique. Well, so y u can be written as f u naught plus z 1 u naught plus z 2 u naught plus z 2 u naught into alpha minus alpha naught plus z 2 u naught beta minus beta naught plus epsilon u. So, this is the I mean the original model now becomes a linear model here and this can be written as of course as y u minus f u naught which is equal to z 1 u naught alpha minus alpha naught plus z 2 u naught beta minus beta naught plus epsilon u. Now, here you can see that you know this one is multiple linear regression model involving two parameter. Well, I would like to write this now in matrix form. So, I will write use the notation y naught for response vector. So, y 1 minus f u naught plus z 2 u naught 1 naught and similarly, y n minus f n naught and I will write my z naught matrix is for z 1 1 naught z 2 1 naught this is for the first observation and similarly, y n minus f n naught similarly, z 1 n naught z 2 n naught. So, this is sort of the coefficient matrix and my parameter vector theta naught is equal to alpha minus alpha naught and beta minus beta naught. So, we want to estimate the parameter alpha naught sorry alpha and beta and we approximated this function about alpha naught and beta naught. Well, and we will see you know how to estimate alpha beta because that is what our m is and epsilon is of course, epsilon 1 epsilon 2 up to epsilon n. I am sure that you understand what is the meaning of this one. See, this one is the partial derivative of the non-linear function f with respect to alpha at the point alpha naught beta naught. So, this one is basically 1 minus e to the power of minus beta naught into x minus sorry x 1 minus 8 and z 2 1 is basically minus 0.49 minus alpha naught and x 1 minus 8 e to the power of minus beta naught x 1 minus 8 e to the power minus 8. So, you can see that this is the derivative of the function f with respect to beta at the point alpha naught beta naught. Similarly, you know this one is 1 minus e to the power of minus beta naught x n minus 8 and this one is minus 0.49 minus alpha naught x n is minus 0.49 minus alpha naught x n minus 8 e to the power of minus beta naught x n minus 8. So, this is what the z naught matrix is and we know that in matrix from this can be now written as y naught is equal to z naught x beta. So, z naught theta naught plus epsilon and we know that then theta naught hat which is equal to z naught prime z naught inverse z naught prime y naught. So, let me put some more notation also. So, this one I will call alpha minus alpha naught I will call that theta 1. In fact, you know two many notations for this non-linear estimation. So, theta 1 naught is basically alpha minus alpha naught theta 2 naught is basically alpha sorry beta minus beta naught. Well, so we have found the least square estimate of theta naught. So, this is the least square estimate. So, what we have observed is that we got theta naught hat which is equal to theta 1 naught hat theta 2 naught hat that is alpha 1 minus alpha naught beta 1 minus beta naught. So, estimate of all these things. So, what we do is that if we take the z naught begin the iteration with the initial guesses say alpha naught equal to 0.30 and beta naught which is equal to 0.02 then what we do is that actually we iteratively we improve this alpha beta. So, the first iteration say 0 we have alpha 0 we took alpha 0 as 0.30 and beta 0 as 0.02 and we approximated the function about this point alpha naught beta naught using Taylor expansion and we made it linear. So, once we have the linear approximation of the function using the Taylor series then we use the least square technique to estimate the parameters. So, this is how we got the estimation of the least square estimation of the parameters theta and then what we do is that I should not write 1 here. What we have at this moment is that we have theta 1 naught hat and which is equal to alpha 1 minus alpha naught and also we have theta 2 naught hat which is equal to beta 1 minus not in fact beta alpha 1 it is alpha minus alpha naught, but what we do is that we consider this alpha 1 as improvement of alpha. Now, alpha 1 is alpha naught plus theta 1 naught hat and beta 1 is equal to beta naught plus theta 2 naught hat. So, we started with alpha naught beta naught and then now the improved value of alpha 1 and beta 1 are in the first iteration they are 0.8416 and 0.1007 and what we do is that again we now we place this alpha 1 and beta 1. So, we place alpha 1 and beta 1 in the same role as alpha 1 and beta 1 in the same role as alpha 1 naught beta naught and we go through the same process. So, this will lead to another revised estimate say alpha 2 beta 2 and so on. So, we started with alpha 1 beta 1 now we have sorry we started with alpha naught beta naught now we have alpha 1 beta 1 and again next step in the next iteration we will have alpha 2 beta 2. So, at the jth step we will have alpha j plus 1 which is equal to theta 1 j plus beta 2. So, we continue this process. So, this process continue until alpha j plus 1 minus alpha j is less than delta and beta j plus 1 minus beta 2 j plus beta 2 j plus beta 2 j plus beta j is less than delta. So, delta is a pre specified small number. So, in our case what we do is that so at this moment we have alpha 1 beta 1 is this right this is my alpha naught beta naught. So, I should write here alpha j beta j. So, in the first 0th iteration they this is alpha naught beta naught alpha 1 beta 1 and in the next iteration I will get you can check that the value of alpha 2 will be 0.3901 and beta 2 will be 0.1004. The third iteration it will be 0.3901 and here it is 0.1016 and the fourth iteration you will see you know see alpha is not changing. So, 0.3901 and 0.1016. So, at the fourth stage so you see that there is no difference between the third and fourth step and the fourth iteration you will see you know see alpha is not changing. So, 0.3901 and 0.1016. So, at the fourth stage so you see that there is no difference between the third and fourth step and the fourth step so alpha 4 minus alpha 3 is less than equal to this quantity. I mean similarly beta 4 minus beta 3 is there is in fact no difference this 0. So, we can stop here. So, this is the first example from the non-linear estimation. Next we will consider a problem from dummy variables. So, dummy variables are utilized to separate blocks of data. So, here is an example look at this data. I do not know whether to fit two straight line one straight line or what. So, we have two sets of data set A and set B and we do not know whether to fit one straight line to all the data together or two straight line or what we do not know. So, he has two sets of x y data given below which cover the same x range. How do you resolve this dilemma? Describe and give model details and things he needs to do. So, we have learned the use of dummy variables. So, we will be fitting a general model involving two dummy variables including say Z naught for this problem. So, this problem. So, if we attach a dummy variable Z to distinguish the two groups, we can look at all four possibilities. You understood what I mean by four possibilities. So, the general model is y equal to beta naught plus beta 1 x plus Z into alpha naught plus alpha 1 x plus epsilon. So, Z is equal to 0 for set A and it is 1 for set B. So, we can see and this can be written of course, as beta naught plus beta 1 x plus alpha naught Z plus alpha 1 Z x plus epsilon. So, this is the model we are going to fit and you can see that here it is a multiple linear regression model and what is the x matrix here? So, we have the x matrix has 1, 2, 3, 4 column. So, the first column is all 1 of course and let me put Z naught I can put also here or X naught or let me put only 1 here and then I have I will have a column for x, I will have a column for z and I will have a column for z x. So, first set has four observations and the observations are 8, 0, 12, 2. So, I will put them 8, 0, 12, 2 and for the first set set A my z is equal to 0. So, 0, 0, 0, 0, 0 and then z x is of course, all 0. So, this is for first set and for the second set 1, 2, 3, 4 and you can check that the x values are 9, 7, 8, 6 and for the second set or set B z is equal to 1. So, I will put z equal to 1, 1, 1, 1 and then of course, z x is equal to 9, 7, 8, 6. So, this is what the what my x matrix is in matrix notation of course, you know I am sure that you understand the difference between this x and this x. So, the model can be now written as y equal to x beta plus epsilon and of course, this beta vector is beta naught, beta 1, alpha naught. So, this beta is beta naught, beta 1, alpha naught, alpha 1. So, you know how to fit this model, this beta hat is equal to just x prime x inverse x prime y. So, let me write down the fitted model now. So, the fitted model is y hat equal to 1.142 plus 0.506 x minus 0.048 z minus 0.036 z minus x z. So, this is my fitted model. Now, the question is whether a single straight line is sufficient. If there is not much difference in the response level, we can go for a single straight line and fit, but we need to see what we have the general model. Now, we can test whether this is ok, I mean one single line is sufficient for that what we have to test is that we have to test the hypothesis H naught that alpha naught equal to alpha 1 equal to 0 because we have considered the general model is this one. Now, if I test to test that whether a single straight line is enough, we have to test that alpha naught is equal to alpha 1 equal to 0. And you know how to test this right using the extra sum of square technique. So, F statistics is S S regression for the full model right minus S S regression for the restricted model right. So, the restricted model does not involve these two terms and this by degree of freedom 2 by M S residual before doing all these things you know just I will construct the ANOVA table for the full model. The source total there are two terms. There are eight observations the degree of freedom is 7 and the regression has four parameters. So, it will be 3 and the residual has degree of freedom 4. Now, the restricted model has only two parameters. So, this will have degree of freedom 1. So, 3 minus 1 is equal to 2 and you can check that this is equal to 0.1818 by 2 by M S residual is 0.3272 by 4 which is equal to 1.11. Now, this F has degree of freedom 2 4. Now, compute the sorry you just check the tabulated value of F. F 0 5 24 is equal to 6.94. So, my observed value F which is equal to 1.11 is smaller than this one. That means H naught is accepted H naught is accepted. That means we can go for a single straight line fit. So, this is in the problem I wanted to discuss for from dummy variable topic and we have another problem involving dummy variable. So, it says that an experimenter has two sets of data of x y type and wishes to fit a quadratic equation to each set. See also wishes to test if the two quadratic fits might be identical in location and curvature, but have different intercept values. Explain how would you set this half for her? So, what she has is that she has two sets of data on x and y and she wants to fit quadratic equation. So, she should go for the general model like y equal to beta naught plus beta 1 x plus beta 1 1 x square plus z alpha naught plus alpha 1 x plus beta 1 1 x square plus alpha 1 1 x square plus epsilon. So, of course, you know that z equal to 0 for set a and 1 for set b. Now, she wants to check whether she can go for two parallel quadratic fit for testing to parallel quadratic. What you mean by parallel quadratic? They have the same location and curvature only they differ in the intercept. What you have to test is that you have to test whether alpha 1 is equal to alpha 1 1 is equal to 0. So, you test the hypothesis that alpha 1 equal to alpha 1 1 equal to 0 for test for testing two parallel quadratic fit this one is appropriate. I am sure that you understand how to test this hypothesis using extra sum of square technique. Now, I will be considering a problem from a topic called transformation and weighting to correct model in adequacy and there we talked about variance stabilizing transformation. So, I will be considering one problem from variance stabilizing transformation. Well, here is the problem consider the simple linear regression model y i equal to beta naught plus beta 1 x i plus epsilon i where the variance of epsilon i is proportional to x i square that is the variance of epsilon i is equal to sigma square x i square. So, this is this means the assumption of constant variance is not satisfied. So, usually if epsilon i follow epsilon i the variance of epsilon i is equal to sigma square and then we go for the ordinary least square technique, but that is not true here variance is changing for different i and it is proportional to x i square. Suppose that we use the transformation y prime which is equal to y by x and x prime which is equal to 1 by x is this variance stabilizing transformation. First solve this problem. So, we start with the model y i equal to beta naught plus beta 1 x i plus epsilon i then I am considering the transformation y i to y i by x i and then the model becomes beta naught by x i plus beta 1 plus epsilon i x i. So, if I call this as y i prime then y i prime is equal to beta naught and my x i prime is 1 by x i plus beta 1 plus epsilon i prime. So, this is the transformed model now you can check in this transform model variance of epsilon i prime is equal to variance of epsilon i by x i and we know that variance of epsilon i is sigma square x i square and then by x i square. So, now the variance of transform error e i prime is constant. So, the answer to the first problem is that yes it is variance stabilizing transformation. What are the relation between the parameters in the original and the transform model? Well what is the relation? I hope that the relation is so here you can see the this is my transform model. So, the slope in the original model becomes intercept in the transform model and the intercept in the original model becomes slope in the transform model I mean that is what I feel. So, next the next problem is that suppose we use the model of sorry suppose we use the method of weighted least square with w i is equal to 1 by x i square. Please recall what is weighted least square is this equivalent to the transformation introduced in part 1. I mean considering weighted least square we with this weight is it same as the transformation we considered in the in part 1. So, that is the question. So, you have to recall what is weighted least square. Let me solve this problem. So, what is weighted least square? Weighted least square is about finding the least square estimate of regression coefficients, but we consider the weighted least square function estimate the parameters. So, the weighted least square function least square function is s say beta naught beta 1. So, what we do in the original model is usual case is that we just minimizing this s s residual that is y i minus beta naught minus beta 1 x i square i from 1 to n. So, this is the s s residual or so we minimize this quantity to we estimate beta naught and beta 1 in such a way that this is minimum that is what the ordinary least square technique is. Now, in the weighted least square we put a weight for the ith observation that is w i and here part c of that problem or part 3 of the problem it says that the weight is 1 by x x square that is 1 by x i square. So, w i is 1 by x i square which is equal to y i minus beta naught minus beta naught beta 1 x i square. So, this is the weighted least square function and this can be written as y i by x i minus beta naught x i minus beta 1 whole square. So, using the weighted least square technique we will estimate the parameter beta naught and beta 1 such that this is minimum this is what the weighted least square technique suggest. Now, the question says whether this is equivalent to the transformation introduced in the original in part 1. Now, in part 1 so this is the transformation we considered. So, here to compute beta naught and beta 1 we here you can see that epsilon dash is it is a constant variance. So, we can go for ordinary least square technique and the function will minimize is say call it S star beta 1 sorry beta naught beta 1. So, this is equal to this is equal to y i prime minus beta naught x i prime minus beta 1. So, we will minimize this to estimate beta naught and beta 1. Now, you can see that you know this one is equal to this one is nothing but y i by x i minus beta naught by x i minus beta 1. So, we are minimizing now we can see that this two here this is the least square function for the ordinary least square and this is the weighted least square function for the weighted least square. Now, you can see that both are same this one is same as this one. So, the function we are considering in the original in the transform model to estimate beta naught and beta 1 is the same as the function we are considering to estimate beta naught and beta 1 using weighted least square technique. So, answer to the problem part 3 is this equivalent the answer is yes it is equivalent. So, this is one problem we considered from the variance stabilizing transformations and next I will be considering one more problem this problem is from generalized linear model technique well. So, this problem says that suppose we have n observations of variables x 1, x 2, x k, y where x r regressor variable and y is a linear variable and y is a linear response variable. And the question is if y i's are Poisson variable with mean mu y what type of analysis is feasible. So, the objective of the generalized linear model topic was if the response variable is not following normal distribution, but if the response variable follows some distribution from the exponential family then how to fit a model for that. So, that was the objective of generalized linear model and here you can see that this response variable y follows Poisson distribution. So, Poisson distribution is a member of exponential family. So, we will see how to solve this problem. So, y follows Poisson with mean mu y and then we know that this probability mass function of y can be written as f y mu which is equal to exponential y l n mu we can check this minus mu minus l n y factorial. And here b theta or b mu I should write may be is l n mu which is the natural parameter which is the natural parameter. What we have learned in this topic called generalized linear model is that how to fit a model when the response variable is not normal. So, the variation in y i could be explained in terms of the regressors value. So, what model we fit that we fit the model some g mu i equal to x i prime beta. So, which is equal to beta 1 x i 1 plus beta 2 x i 2 plus say x i 1 plus beta k x i k and this g mu is the link function which is nothing but the natural parameter that is l n mu i. So, the model we go for is that l n mu i is equal to x i prime beta. So, this can be written as mu i is equal to e to the power of x i prime beta and which is nothing but expectation of y i. So, that is mu i which is equal to e to the power of x i prime beta. So, this is the model we need to fit if the response variable follow Poisson distribution and usually what we fit is that if the y follow normal distribution then we fit the model e y i is equal to x i prime beta. But if it is following Poisson then we fit this model and if y follows say binomial then depending on the natural parameter we get the model to be fitted. So, I have tried to you know cover problem from different topics. I have tried to consider in this course and that is all. So, we need to stop now. Thank you.