 We will continue our discussion on basic tool of economic analysis what we are discussing in the previous class also. And particularly here we are trying to see how the regression analysis use for or as a tool for the economic analysis. So typically we are talking about the regression techniques how it is serving as a tool for the economic analysis in the previous class also and this session also. So what is regression technique just a quick recap what we discussed in the last class and then we will continue our discussion on the different methods to find out the value of the slope intercept and also what are the different methods that goodness to fit and all this. But before that we quick recap what is regression technique. So regression technique is a statistical technique used to qualify the relationship between the interrelated economic variable generally it is used in physical and social studies where problem of specifying the relationship between two or more variables is involved. And the first is to estimation the coefficient of the independent variable in the technique the first step is to estimate the coefficient of the independent variable and then measurement of the reliability of the estimated coefficient. The first is to estimate the coefficient of the independent variable because if you know regression it is the relationship between the dependent and the independent variable. So the first step of regression technique is to find out the value or the estimate the value of coefficient associated with the independent variable and then to measure the reliability of this estimated coefficient. And for this generally we formulate a function we just form formulate a hypothesis first and on the basis of this we formulate a function. So formulation of hypothesis is done on the basis of the observed relationship between two or more facts or event of real life. And after this we generally translate the hypothesis into a function. So suppose a hypothesis is sense growth is a function of advertisement expenditure. This hypothesis can be translated into the mathematical function where it leads to y is equal to a plus b x where y it is sales x is the advertisement expenditure and a and b are the constant. Then after translating the hypothesis into the function the basis for this is to find the relationship between the dependent and independent variable need to be specified and stated in the form of equation. So here in the equation a is the intercept it gives the quantity of sales without advertisement when x is equal to 0 and b is the coefficient of y in relation to x gives the measure to increase the sales due to certain increase in the advertisement expenditure. Now here the tax of analysis comes and what is the tax of analysis? The tax of analysis is to find the value of the constant a and b where a is the intercept and b is the slope because on the basis of value of a and b will come to know what is the relationship between the dependent variable and the independent variable. Generally two methods follow is being followed to find out the value of a and b one is the rudimentary method and second is the mathematical method generally typically known as the regression technique. So we will start with the rudimentary method that how this value of a and b is decided and then we then we talk about the regression technique. So first we will start the rudimentary method and for suppose we take here the value of a constant a and b we need to find out through rudimentary method. So y is equal to a plus b x is the functional form or suppose we take a value y is equal to value of a is 40,000 and this is 1800 x. So here we can say a is 40,000 and b is 1800 and this is the slope this is the intercept. So if y is the sales and x is the advertisement expenditure when there is no advertisement expenditure the value of y will be equal to 40,000. So total sales without advertisement expenditure will be 40,000 and if since value of b is 1800 we can say that if the measurement unit is in the million term advertisement expenditure of 1 million will bring 1800 million increase in sales because this is the slope of the value of the slope is 1800. So once if the measurement unit is in million term advertisement expenditure of 1 million will be 1800 million increase in the sales. Now the question is that if this is the value of a and b what is the reliability of this value whether how reliable this value or what is the accuracy of this value a and b or we can say that how close this value of a and b to the regression line. This cannot be solved through the rudimentary method because this is generally a crude method being followed or the this is very elementary in nature this rudimentary method is very elementary in nature. This cannot address this question that how reliable is this or how accurate is this or how close to this in the this typical regression line. Because it indicates the visualization of function not the formulation of the actual function. So if you look at it is not formulate the actual function rather they visualize that this is the value it has to be done. So in this case this is a approximation because this is a crude method this is a approximation not the actual relationship between two variables and that is why rudimentary method is ruled out the rudimentary method is not being followed much to find out the value of a and b because they are not considering the actual function rather they are considering the approximate function. Now the second method comes here to find out this value of a and b typically the value of slope and intercept in order to understand the relationship between two variables. Two variables are economic variables in nature and we generally use the second method as the regression technique to understand the relationship between two economic variables. So the second variable comes as the regression methods here generally four components of this regression methods what we are going to discuss first is to estimate the error term that through the ordinary least square method then we will test the significance of the estimated parameter and then we will do the test of goodness of fit or to find out the coefficient of determination because that will talk about the overall explanatory power of this model that how the model is or how the relationship between the dependent variable and independent variable is shown through the regression technique. So typically this four stages being followed in case of a regression method when we estimate a and b following this method first we estimate the error term then we do this through the ordinary least square method then after getting the value of the parameter estimated parameter we test their significance that how significant what is the level of significance of this estimated parameters and then we do the test of goodness of fit to understand the overall explanatory power of this model or overall explanatory power of the relationship between these two variable that is dependent variable and the independent variable. So first we will see how basically this regression line is being done or how regression the simple regression analysis then we will see how the error term comes and then we will see how the level of significance can be done from the estimated parameter what we have calculated at using the ordinary least square method and finally we will do the goodness of fit. So the first step is always to in a regression technique the first term is always to first step is always to estimate the error term. Let us take an example that what will be since this is a relationship between the advertisement expenditure and sales will take different value of advertisement expenditure and how it is affecting the sale. So suppose this is 40, 50, 60, 70, 80, 90 or we can call it 2, 4, 6, 8, 10, 12, 14. So we can just plot a regression line and in the regression line some points will find above the line and some points will find below the line. So all the points all the actual data points is not on the regression line there may be also some deviation in the regression line. So all the advertisement expenditure do not fall in the regression line. So since all the actual data point is not in the regression line that leads to some error term in the model why we call it error term because all there is a deviation in the regression line and the actual data point. And since there is a error term the slope B does not explain the total change in Y with respect to change in X because there is some error term. So slope B is what? slope B is basically del Y with respect to del X. So we can say that since there is a difference between the regression line and the actual data point that leads to the error term and since there is error term in the model that implies that the slope B is not explaining the total change in Y with response to change in the X. So there is some unexplained part of the Y and this unexplained part of the Y is generally the error term. Now we will see what is the error term. So error term refers to the deviation from the plotted point from. So what is the error term? Error term is the generally the deviation from the straight line drawn through the center of the plotted point. So this is the center of the plotted point and that is why we call it the regression line. And what is error term? Error term basically the deviation from the actual data point and this line. So this is actual data point, this is the line. So the deviation is the error term, this is the actual data point, this is the line, this is the deviation is the error term, is the actual data point, this is the line, this is the deviation from the error term. And similarly, this is the error term because that there is a deviation in the whenever there is a deviation in the actual data point and the regression line or the line that is from the center of the data points that gives us the center of the plotted point that gives us the error term. Now, why this error comes? This error comes because there is a inaccurate recording or the specific or inaccurate specification of the sales data. So, sometimes there is a there is inaccurate recording of the this advertisement expenditure and the sales data or also to specify the sales data there is inaccurate and that leads to the error and that is why we get a deviation between the actual point and the regression line. There are two kind of errors, one is specification error, second one is the measurement error. What is specification error? Specification error generally comes because other factor influencing sales could not be specified and included in the function. So, specification error comes from the fact that whatever the other factor influencing the sales could not be specified and included in the function because here specifically we are saying that how advertisement expenditure is influencing the sales. But there may be many other factor which is influencing sales that is not being considered and that could not be specified that could not be added in the function and that is why this specification error comes. And second is the measurement error. This generally arise due to computational error in the measuring, sampling, coding and writing the data. So, specification error when we missed out some variables which influence the sales and measurement error comes typically it is more mechanical kind of thing this arise due to this technical problem that is computational error in measuring, sampling, coding or writing the data. So, we will see that how we can understand this error term like this how this error comes and how to minimize this error using the less square method. So, we can say this y t is the observed value and y c is the estimated value. What will be our error term? Error term will be y t minus y t minus y c and if y c is equal to a plus b x this is to be the best fit. So, here again we will understand this we will take this graph to understand this whether what is a best fit. So, here if you look at if x is equal to 8 million in a specific year then in this case suppose this is the suppose this is the data point this is the actual data point. Here we can call it may be this is this is m this is n this is p. So, this p n is what? p n is basically the error term because this is the difference p m is basically the error term because this is the difference between the actual data point and the actual data point and the original or the regression line. So, which one is best fit? Best fit is 1 where this y t will be equal to y c. So, in this case we can say that error term is equal to y t minus y c and it will be if y t is equal to y c then we will get error is equal to 0 and we will get it a best fit. So, but since we always assume that there is some amount of error because there is a deviation between the regression line and the actual data point to we have some error term and this error term is y t minus y c and y c can be again called as the a plus b x and this is the functional form for the error term. Now, we need to minimize this functional form to understand or this functional form to know that how to minimize this functional form error term we need to minimize this using the least square method. Now, what is a least square method because we will be using this method to minimize the error term that comes between the regression line and the actual data point. What is this least square method? Generally regression technique minimize the error term with a view of finding the regression equation that best fit the observed data. This method use the sum of the square of the error term that regression technique 6 to minimize and find the value of a and b that produce the best fit line. Now, how this OLS method they minimize this error term? They find the sum of the square of the error term that regression techniques trying to minimize. They will take the sum of the square of the error term the deviation between the actual data point and the regression line what through the regression technique they are trying to minimize and find the value of a and b that will fit the that will best fit the regression line or the with the observed data point or the actual data point. So, we will see now this OLS method that how this OLS method is being used to minimize the sum of the error term by finding the value of a and b. So, we have suppose we have n pair of observation we can call it from both the variable x 1 y 1 to x n y n. Here we basically want to fit the regression line given by the equation y is equal to a plus b x and what is the motivation to fit the line? We need to find such value for a and b that will minimize the that will minimize the sum of square of error term. So, we need to calculate the value of a and b which will minimize the sum of the square of the error term. So, this is e is equal to t from 1 to n that is e t square or we can call it as the e that is the error term. So, in through OLS method we are trying to find the value of a and b which will minimize the sum of the square of the error term and this is the sum of the error term square. Now, we will see how we are going to minimize this error sum of square. So, error sum of square generally t 1 to n that is y minus a minus b x. So, e is equal to t is equal to 1 by n. So, y minus a minus b x square and we need to differentiate it with respect to we need to differentiate with respect to a and with respect to b. So, if it is differentiated with respect to a then we get 2 sigma y minus a minus b x and in this case minus 2 sigma x y minus a minus b x and we need to minimize this and for minimization we need to equalize this with 0. So, for minimizing this error term. So, minimizing this error term we have to set d by d a has to equal to 0 and d by d b has to be 0. So, d by d a which is minus 2 sigma y minus a minus b x is equal to 0 and d e d b is minus 2 e x y minus a minus b x is equal to 0. Alternately see minus 2 is a common factor in both these cases we can consider this take out this minus 2. So, this will be sigma y minus a minus b x has to equal to 0 and x y is equal to a minus b x has to be equal to 0. Now we can rewrite this equation as we can rewrite this equation as e y minus a minus b x is equal to 0 e x y minus a sigma x b sigma x square is equal to 0. So, arranging this we can call it e y is equal to n a plus b e x and e x y is equal to a sigma x plus b e y square. So, these are the normal equations and this normal equations can be solved by determining the value of constant a and b. So, we need to find out the value of a on the basis of this normal equation. So, this is e x square e y minus e y e x y and by n e x square minus e x whole square and for b we can find out this n e x y minus e x e y minus e x e y minus n e x square minus e x whole square. So, once we put the numerical value here for x and y once we apply this numerical value of x and y we get the value of a and b and also we get the regression equation. So, this is the formula to find out the a and b equation. So, in that formula we can put the value of x and y through that we can get the value of a and b and from there again we can get the regression equation which will best fit to the data. Now, so we use OLS method to find out the OLS method to find out the value of a and b which will best fit to the regression line and also which will minimize the error term what comes from the difference between the deviation between the original regression line and the actual data point or the observed data point. Now, what is the problem with this regression techniques? This technique shows only the probable tendency not the exact tendency and that is how you cannot say that whatever the it cannot be exact the value after getting the value of a and b also it may not possible to get the best fit line and this technique does not consider the effect of predictable and unpredictable event which might affect the which might affect the predictable event. So, neither it consider the effect of predictable nor unpredictable event, but which might affect the result or which might affect the value of a and b and that is why this method is there is a problem with this method. Now, how to overcome this we can find out how reliable is the estimated value of coefficient, how well estimated regression line fits to the observed data and that we can find out through the test of statistical significance, we can find out the estimated value of coefficient and how well estimated regression line will fit to the observed data. How will do this test of significance? Generally, this is the test of significance of the estimated parameter. So, how we generally do it? We take it a null hypothesis and we try to see that whatever the null hypothesis what is the probability of rejecting the null hypothesis? We find a level of significance and we say that if the level of set up a limit and set up a percentage that if the level of significance of this then the null hypothesis has to accept. If the level of significance of this then the null hypothesis has to reject. So, for generally hypothesis testing also with this the level of significance. Then this level of significance is determined by the standard ratio and the t statistic. So, next we will find out the formula to find out how to what is the standard error or what is the how to find out the standard error and how to find out the t ratio because that will through the standard error standard error and t ratio we can find out what is the level of significance or what is how reliable is the estimated coefficient or how the estimated coefficient will fit the regression line into the observed data. So, we will find out the standard error. So, standard error is it is nothing but the standard deviation of estimated value from the sample value. So, standard error of coefficient B can be this is the formula to find out the standard error for coefficient B that is sigma y t minus y t square by n minus k e x t minus x bar square or rewriting this e t this is the error term square n minus k e x t minus x bar square. So, here x t and y t this is the actual sample value for x and y for the time period t y e t is the estimated value x bar is the mean value of x n is the number of observation and k is the number of estimated coefficient. So, this n minus k is generally known as the degree of freedom. So, this is how we calculate the standard error this is the formula to calculate standard error and once standard error is calculated then we can find out the t ratio and t ratio on the basis of the value of B and standard error of the B. So, this t ratio with the value of t and S B we can find out the level of significance and the level through the level of value of the level of significance we can find out whether to reject null hypothesis or accept and on that basis we can find out that how reliably is the estimated coefficient and how the estimated coefficient will fit into the fits into the regression line best fit regression line. So, this is this level of significance or this test is only to find out how reliable is the estimated coefficient. But apart from this also there is a test of goodness of fit or we call it is the coefficient of determination it is to test the overall explanatory power of the estimated regression equation. So, it is or the estimated regression model or the estimation regression equation. So, through this coefficient of determination or the test of goodness of fit generally this test is conducted to test the overall explanatory power of the regression equation because that gives the clarity that gives the accuracy that whether the regression equation is fitting into the best fit regression line or not. So, to test this goodness of fit or to find out the coefficient of determination we will see what is the formula to do that. So, this coefficient of determination is all otherwise known as R square and R square is find through the explained variation of explains variation in Y and total variation in Y. So, what is the explained variation in Y that is t is equal to 1 to n that is y t estimated mean value this is the explained and what is the total, total variation is e t is equal to 1 n then y t and y bar square. So, this is the explained variation in Y this is the total variation in Y through this will find out the coefficient of determination or the R square. See, so through this R square is since this is explained and total so, the explained variation in Y is y t by y bar square and total variation is y t minus y bar square. So, this total variation it has two part one is explained and another is unexplained. So, from here unexplained part is e t is equal to 1 to n. So, y t is y t e square and total variation is that is unexplained y t e minus y bar this is explained plus unexplained that is y t minus y t e square. So, now, this is the total variation this how we got that is through the explained and the unexplained part of the variation in Y. So, R square is equal to y t e minus y bar square this is the explained variation and y t minus y bar this is the total variation. So, this value of R square once we get the value of R square this talks about what is the total variation in dependent variable total variation in dependent variable explained through independent variable. So, suppose the value of R square comes as R square is equal to suppose 0.91 it means 91 percent of the total variation in the dependent variable is explained by the explained by the independent variable and this is also we can say highly explanatory power and the regression line is good fit because of the value of R square is 91 percent. Next we will find out one more one more relationship between those two that is from the R square and this square root of R square gives us R and R is nothing, but the coefficient of correlation coefficient and what is the role of this correlation coefficient in regression equation this measures the degree of association between dependent and the independent variable. So, this correlation coefficient generally measures the degree of association between dependent and independent variable and how we get this R this is from the square root of R square. So, this suppose we get a value of this is 0.88. So, how we can explain this 0.88 or how what is the implication of this 0.88 there is a strong association between dependent and the independent variable because the value of R is 0.88 suppose the value of R is 0.22. So, out of one if it is the value of association is just 0.22 we can say that these two variables are not strongly associated or these two variables are not may be associated with each other means the when the if change in one variable it is not going to affect the it is not going to change the other variable. So, if you summarize whatever we discussed today about the use of regression technique or use of rudimentary method to understand this or to find out the relationship between the economic variable. Generally regression analysis is widely used in business and economic analysis, but when it comes to that what is the what is the contribution in term of regression technique they only the provides the method of measuring the regression coefficient how both of them they are related that is through correlation coefficient and may be what is the magnitude what is the change in the dependent and independent variable that is through the regression coefficient. So, this regression analysis is provide providing only a method to measure the regression coefficient and also to test their reliability or goodness of it, but it is not providing the theoretical base of the relationship between the dependent variable and independent variable. So, the null hypothesis always there that there is a dependent variable and the independent variable they are related how they are related and how independent variable and dependent variable they are related. So, regression technique is not contributing to that how this theoretical relationship is being build up or what is the basis of this kind of relationship between the dependent variable and dependent variable they are not contributing to the theoretical structure of this relationship between the dependent variable and independent variable. Only they are providing the methods which empirically talks about the relationship between the dependent variable and the independent variable. And it also talks that when we estimate the coefficient of the independent variable how reliable it is and also that how it whether it fits to a good regression line or best fit regression line or not. So, with this we conclude the discussion on the regression analysis and then we will start the next model that is on the theory of demand.