 So, this is my first lecture on regression analysis. I would like to introduce myself as Dr. Soumen Maithi. I did my B.Sc. and M.Sc. in statistics and received a PhD degree from Indian Statistical Institute, Kolkata. Currently, I am faculty at Indian Institute of Technology, Kharagpur and Indian Institute of Science, Education and Research, Pune. I am grateful to both the Indian Institute for giving me this opportunity to work on NPTEL project. So, here is the course, a prerequisite. So, I would expect the viewers or specially the students, you know, to know basics of probability and statistics and statistical inference. So, more precisely, you know, I would like the viewer to know the discrete and probability, discrete probability and also continuous probability distributions and say point estimation, interval estimation and also, you know, testing of hypothesis. So, this course is divided into several topics or module. So, here are the topics. It is a simple linear regression, multiple linear regression, selecting the best regression model, multicollinearity, model adequacy checking, test for influential observations and then, you know, transformation and waiting to correct model inadequacies, dummy variables, polynomial regression, generalized linear models, non-linear estimation and regression models with autocorrelated errors, measurement errors and calibration problem and finally, we will be solving some problems. So, I will have some sort of tutorial classes and I would be basically following these two books. The first one is Applied Regression Analysis by Draper and Smith and the second one is Introduction to Linear Regression Analysis by Mont-Goummery, Electric and Binding. So, here is the content of today's lecture. So, today basically, you know, I will introduce what is regression analysis and then, I will be talking about simple linear regression and least square estimation of the parameter. That means, the regression coefficients. So, let me talk about what is regression analysis. So, regression analysis is a statistical tool for investigating the relationship between a dependent variable and one or more independent variable. Just now, I will give an example to explain what I mean by dependent variable and independent variables and regression analysis is widely used for prediction and forecasting and it has application in different fields like economics, management, life and biological science, physical and chemical science, engineering and social science. So, here is the example. I told that, I will give an example to explain what I mean by independent and dependent variable. So, I said that regression analysis is a statistical tool for investigating the relationship between a dependent variable and one or more independent variable. So, consider this example. Suppose, you are marketing analyst for Disney toys and you gather the following data. The first column is advertising cost and the second one is sales amount. So, here this is a and what we want is that, what is the relationship between the sales and advertising cost. So, here you can see that the amount of money we want to spend. That is sort of controlled variable. You can decide how much amount of money you want to spend for advertising, but the sales amount is not a controlled variable. You cannot control the sales amount. So, the sales amount is a dependent variable. It depends on advertisement cost. Not only maybe this is one factor, but it depends on the amount of money spent on advertisement. So, this is a dependent variable, but the advertising cost is an independent variable. Also, we call it controlled variable. You can control it. So, usually, so I hope that you understood the difference between the independent and dependent variable. So, usually this variable which is independent, this is denoted by X and X is a regressor variable or also we call it independent variable, whereas the sales amount, we do not have any control on sales amount and this one is this type of variable is denoted by Y and Y is response variable and also we call it dependent variable. So, as I told regression analysis is a statistical tool for investigating the relationship between one dependent variable and one or more independent variables. So, here I mean the whole objective of this course is to find the relationship between the variables, one response variable and several independent variables. Let me talk about scatter plot next. So, here these are the same observations. I have been given a set of observations say X i, Y i. So, X i stands for regressor variable and Y i stands for response variable and I have five observations like X 1, Y 1, X 2, Y 2, X 3, Y 3, X 4, Y 4, X 5, Y 5 and the scatter plot is basically obtained by plotting this data on X, Y plane. Formally you can say that scatter plot is a mathematical diagram to display values of two variables for a set of data. So, now I will explain the scatter plot for this Disney toy data. So, the first observation X 1, Y 1 is plotted here. The second observation is X 2, Y 2. So, that is plotted here. I should say that usually in regression analysis the regressor variable is plotted along the X axis and the response variable is plotted along the Y axis. So, these two points corresponds to this two data point and then 3, 2 is here and I have next 4, 2 here and then 5, 5, 4 here. So, this is the scatter plot, corresponds to the data for Disney toy problem and this scatter plots are used to investigate the possible relationship between two variables. So, the scatter plot are used to investigate the possible relationship between the variables. Now, if the scatter plot indicates sort of linear relationship between the variables. So, in that case we need to go for linear model, but if the scatter plot indicates sort of non-linear relationship between X and Y, then we need to go for like maybe quadratic feed or the cubic feed or the higher order polynomial feed and looking at this scatter plot, I feel that you know this scatter plot indicates sort of linear relationship between the response variable and the regressor variable. So, for this Disney toy data we would go for linear model between X and Y and the objective of this module is to study you know how to fit linear relationship, more specifically simple linear regression for given a response variable and one a regressor variable. So, now we will talk about simple linear regression. So, simple linear model regression model is a model with a single regressor X that has a linear relationship with a response. So, the simple linear regression model is a Y equal to beta naught plus beta 1 X plus epsilon, I will explain it. So, here you know that Y is a response variable and Y is a regressor variable, beta naught is called intercept, beta 1 is called slope and epsilon is a random error. Before going into the detail of this one, I want to mention that one more thing like just recall the Disney toy example, there we have one variable that is the advertising cost and X stands for advertising cost, the other one is sales amount. So, I told that X is a controlled variable. So, you can decide how much money you want to spend for advertising. So, X is not a random variable, whereas Y is a dependent variable, you cannot control the sales amount. So, Y is dependent variable, it depends on regressor variable and it cannot be controlled. So, Y is a random variable and X is not a random variable, it is a control variable, you can say it is a deterministic variable or mathematical variable, but X is not a random variable, Y is a random variable. So, come back to this simple linear regression model, Y equal to beta naught plus beta 1 X plus epsilon. So, what is the meaning of this one is that for a given X, that means given advertising cost, the corresponding observation Y, that means the corresponding sales amount consists of the value, consists of the value beta naught plus beta 1 X plus an amount epsilon. So, it says that given the advertising cost, the corresponding sales amount consists of the value beta naught plus beta 1 X plus some error component, I mean that is a variable component. So, next we will make some basic assumption on the simple linear model, we now make some basic assumption on the model. The model is Y i, Y i, Y i, Y i, Y i, Y i, Y i, Y i, equal to beta naught plus beta 1 X i plus epsilon i, for i equal to 1 to n. So, before I wrote Y equal to beta naught plus beta 1 X plus epsilon, now I am writing the model, the same model for the ith observation. And here I said that this is a random error component. So, what we assume that the first assumption is that epsilon i is a random variable with 0 mean and variance sigma square, which is 0. So, what you are given is that you are just given a set of observation X i, Y i for i equal to 1, that is all. And from the scatter plot, if you see that the relationship is linear, then you are going to fit a simple linear regression model. And you are making some assumption on the model. So, epsilon i, the error term is a random variable with 0 mean and variance sigma square, which is unknown. So, that means, expectation of epsilon i is equal to 0 and variance of epsilon i is equal to sigma square. The second assumption is that, this is very important part. The second one is the epsilon i and epsilon j are uncorrelated i not equal to j. That means, the covariance between epsilon i and epsilon j is equal to 0. The third one is that epsilon i is a normally distributed random variable with mean 0 and variance sigma square. That means, assuming that epsilon i follows normal distribution with mean 0 and variance 0. Now, what you can see that this epsilon i's are uncorrelated and they are normally distributed. So, under this normality assumption, now this epsilon i's are not only uncorrelated, they are independent also. So, these are independent. So, what you can see is that epsilon i is the consequence of this one in terms of the response variable y i. So, what we are basically assuming is that, let me write down. So, I said that y is a random variable and x is a controlled variable. It is a deterministic variable. It is not a random variable. So, and we made several assumption on epsilon i. Now, what is the consequence of this assumptions on y or in terms of say y? So, y i is equal to beta naught plus beta 1 x i plus epsilon i is equal to beta naught plus beta 1 x i plus epsilon i. From here, I can write expectation of y i is equal to expectation of beta naught plus beta 1 x i plus epsilon i and this is equal to beta naught plus beta 1 x i just plus expectation of epsilon i, which is equal to 0. And what is the variance of y i? Variance of y i is equal to variance of beta naught plus beta 1 x i plus epsilon i, which is equal to variance of epsilon i because these are not random variable. So, which is equal to sigma square and also finally, we assume that epsilon i follows normal distribution with mean 0 and variance sigma square and they are independent and the consequence of this one in terms of the response variable y i is. So, y i follows normal distribution with mean 0 sorry with mean beta naught plus beta 1 x i and variance sigma square and they are also independent. So, the assumption on the error term like epsilon i having expectation 0 variance sigma square and follows they are uncorrelated and they are uncorrelated and epsilon i follows normal distribution. So, finally, epsilon i is following normal distribution with mean 0 variance sigma square and they are independent. So, the consequence of that in terms of response variable is that y i follows normal distribution with mean beta naught plus beta 1 x i and variance constant variance sigma square. So, we are assuming that the ith observation is from normal distribution with mean beta naught plus beta 1 x i and the constant variance sigma square. So, given a set of data you need to be very careful about whether your data set satisfy this basic assumption or not. But, if the data set is not satisfying the basic assumptions then you cannot go for the usual least square fit and all these things. I will be talking about those things maybe in this class only. So, there will be a topic called model adequacy checking. So, that talks about given a data set you know while fitting a simple linear regression model how to check whether the basic assumptions are true or not. So, we have to wait for that model adequacy checking topic. So, let me once again you know graphically say what is the how this situation I mean how this assumption is illustrated in this figure. So, we made the assumption like epsilon i follows normal distribution with 0 means and variance sigma square and they are independent right. And the consequence of that in terms of response variable is that y i follows this. So, what we are assuming is that this is my x 1 y 1 data this is x 2 y 2 this is x n y n and this line is x 2 y 2 this is x n y n and y equal to beta naught plus beta 1 x. So, this you can put also i anyway. So, the first situation like I mean the assumption in terms of y is you know graphically illustrated here. So, it says that the i th observation or the i th value of the response variable y i that is coming from a normal distribution with mean beta naught plus beta 1 x i and variance sigma square. So, this is the normal so x y 1. So, this is the data x 1 sorry x 1 y 1. So, y 1 is from normal distribution with mean beta naught plus beta 1 x 1 and variance sigma square. So, this is from this distribution this is a normal distribution and x 2 is again from normal distribution with different mean with mean beta naught plus beta 1 x 2 and variance constant variance sigma square. I mean this part is you know it is it is necessary that you know you understand this part or the basic assumptions we made. So, assuming this means you are assuming that the response variable follows normal distribution and the i th observation is coming from normal distribution with mean beta naught plus beta 1 x i and constant variance sigma square. So, next we will move for least square estimation of the parameters. So, we talked about we know what is the simple linear regression model y equal to beta naught plus beta 1 x plus epsilon. So, least square estimation of the parameters. Means estimating the regression coefficients beta naught and beta 1. So, this is called intercept and this is called slope and fitting a simple linear regression model is nothing but estimating this regression coefficients. So, it says that the parameter beta naught and beta 1 are unknown and must be estimated using the data. So, what you are given is that you are just given a set of observations n observations and you have to fit if the scatter plot indicates that you know there is linear relationship you can go for a simple linear regression fit and also in the regression analysis you know the starting point is generally fitting linear model. So, suppose well so this is the scatter plot for the Disney toy data and we have to estimate the regression coefficients that means we have to fit a straight line. So, this is given for the given data suppose the fitted model is y hat which is equal to beta naught hat plus beta 2 hat into x. So, this is the fitted line and you can see that I have drawn two lines the same scatter plot this is one straight line say suppose this is my fitted model for this scatter plot or for this data and this is another fit. Now, which one is better whether this one is better or this one is better or this one is better. So, this one is better well. So, I will come back to this slide again let me write one important thing the line fitted by least square is the one that makes the sum of square plus the sum squares of all vertical discrepancy as small as possible. So, this is the now main idea behind the least square fit the line fitted by least square technique is the one that makes the sum of square of all vertical discrepancy as small as possible. So, what is the meaning of that? So, what the least square technique does is that it fits a line such that what I mean by this vertical discrepancy this is the vertical discrepancy for the fourth observation. So, for the fourth observation this is x 4 basically this is x 4 y 4 right x 4 y 4 is equal to 4 2. And suppose this is the fit this is the fitted line y hat is equal to beta naught hat plus beta 1 hat x and then this point is nothing but x 4 y 4 hat. So, the vertical discrepancy is nothing but let me write that as e 4 that is called residual for the fourth observation. So, e 4 is equal to this distance that is y 4 minus y 4 hat. So, this is what the what we mean by the vertical discrepancy. And we what the discrepancy least square estimation or least square technique does is that it fits a model such that this e i square for i equal to 1 to n in general, but here it is 1 to 5 this is minimum. So, in order to say you know which fit is good whether this is good or this one is good. So, what you do is that you compute this e i square this is called residual sum of square this is s s residual. You compute s s residual for this fit you compute s s residual for this fit and you see which one is smaller that one is better than the other one. And what the least square estimation does is that it computes it provides a fit which has minimum s s residual. So, I hope that you know you understood the basic and very nice and you know natural idea behind the least square estimation. So, we estimate beta naught and beta 1 so that the sum of square of all the differences between the observation y i and the fitted line is minimum. So, the meaning of this one is that you compute all the residuals e 1 e 2 e 3 e n and then you come then this beta naught and beta 1 are estimated. So, that this summation e i square i equal to 1 to n is minimum. So, I will write this. So, you estimate beta naught and beta 1 so that the sum of square of all the difference between the observation y i and the fitted line is minimum that means s which is nothing but s s residual sum of square residual which is equal to e i square i equal to 1 to n which is nothing but y i minus y i hat square which is nothing but y i minus beta naught hat minus beta 1 hat x i square is minimum. So, you have to estimate you have to find this beta naught hat and beta 1 hat which is beta naught hat is the estimate of beta naught and beta 1 hat is the estimate of beta 1 such that this is minimum. So, the least square estimator of beta naught and beta 1 that is beta naught hat and beta 1 hat they must satisfy the following two equations you differentiate s with respect to beta naught and at the point beta naught hat beta 1 hat and beta 1 hat. So, this is so let me just write down what is s s is equal to summation y i minus beta naught hat minus beta 1 hat x i square. So, this is what the s is so you find beta naught and beta 1 such that this is minimum. So, this one is equal to the partial derivative of this one with respect to beta naught is minus 2 y i minus beta naught hat minus beta 1 hat x i equal to 0. So, this is one equation and the other one is partial derivative of s with respect to beta 1 at the point beta naught hat beta 1 hat. So, now we are differentiating with respect to beta 1 so that is equal to minus 2 summation y i minus beta naught hat beta 1 hat x i into x i right. So, this is equal to 0. So, these are called these two equations are called normal equations normal equations. Since there are two unknown parameter you will get two normal equations and you can see that this normal equations are independent right. So, you can uniquely fit beta naught and beta 1. So, the estimator beta naught hat and beta 1 hat are solution of the equation summation y i minus beta naught hat minus beta 1 hat x i equal to 0 and x i into x i into x i into x i into x i into x i into x i. So, you have two independent normal equations and from here you can estimate beta naught hat and beta 1 hat. So, we will be doing that let me start with this one. So, what the first equation is summation y i minus beta naught hat minus beta 1 hat x i equal to 0. So, from here I can write that summation y i minus n beta naught hat because this sum is over from 0 to from 1 to n minus beta 1 hat sum x i i is from 1 to n this is equal to 0. So, then n beta naught hat is equal to 0. So, this is equal to 0. So, sum over y i minus beta 1 hat summation x i and from here I can write that beta naught hat is equal to y bar minus beta 1 hat x bar. So, of course, where y bar is equal to summation x i x bar is equal to summation x i by n and y bar is equal to summation y i by n. So, this involves beta 1 hat. So, we need to estimate beta 1 hat also I mean we need to find the beta 1 hat also. So, let me start with the second normal equation that was summation x i into y i minus beta naught hat minus beta 1 hat x i equal to 0. This is beta naught hat minus beta 1 hat x i equal to 0 and just know what we obtained is that beta naught hat is equal to y bar minus beta 1 hat x bar. So, I can plug this one here. So, what I will get is that x i y i let me write one more line minus y bar minus plus beta 1 hat x bar minus beta 1 hat x i is equal to 0. So, from here I can write that x i into y i minus y bar is equal to minus beta 1 hat x bar minus beta 1 hat sum over x i minus x bar I hope you understand this one. So, from here I can write that my beta 1 hat is equal to sum over y i minus y bar into x i by sorry I missed one x i here by sum over x i minus x bar into x i. So, this can be written as sum over y i minus y bar into x i minus x bar by summation x i minus x bar into x i minus x bar by summation x i minus x bar into x i minus x bar. So, this is x i minus x bar square. So, what I have added is that I have added a term here I can prove that see because of the fact that I can prove that y i minus y bar into x bar is 0. So, let me just now prove that this is equal to sum over y i x bar minus summation x bar y bar and if I write y bar is equal to 1 by n summation y i. So, summation y i I can write as n into y bar. So, that is n into x bar into y bar minus this sum is from 1 to n and it is independent of i. So, n into x bar and y bar. So, this is 0 also we will use the notation that this is equal to s x y by s x x. So, whole whole whole whole whole whole whole whole whole whole whole what we got is that finally, we got that beta naught hat is equal to y bar minus beta 1 hat x bar and also we got that beta 1 hat is equal to summation y i minus y bar into x i minus x bar by summation x i minus x bar whole square. So, we have learnt how to fit a simple linear regression model given a set of observations say x i y i for i equal to we know how to fit a simple linear regression model like y i is equal to beta naught plus beta 1 x i plus epsilon and here are the least square estimators beta naught hat and beta 1 hat and in the next class we will be talking about several properties of this least square estimators and it can be proved that these are the best linear unbiased estimators using Gauss Markov theorems. Thank you.