 Regration analysis basically draws a line through the data, and the line is defined by regression coefficients or the betas in the model. Our task now is to figure out how we estimate those betas. So we give regression analysis some data of the dependent variable and one or more independent variables, and the regression analysis tells us where the best line goes and the line is defined by the betas. So how does the analysis know which line is the best? To answer that question, we'll be looking at some example data. This same data set is used in one of the assignments and it comes from the census of Canada from the early 70s. The data set is called prestige and we have the observations here are occupations, there are 102 of those. We have data for the education, which is the mean number of years that occupants of that occupation hold, what's the average income for that occupation, how many women there are from 0 to 100% in that occupation. What is the prestige score that is defined in some way that we don't really care about? We have census code, which is an identifier that we don't need, and we have type, which is a categorical variable, that can be the white color, blue color or professional. Then there's some information about where the data comes from and this is a print out from the R-package, C-A-R documentation that contains this data set. So we will be doing a regression estimation and our task is to explain prestige with education. So how much the prestige of an occupation depends on the amount of education in years that is required for that occupation. So our regression model, we said that the prestige is a weighted sum of beta 0, the intercept or the base level for people with no education or professionals with no education, beta 1, which is the effect of education plus some variation that the model doesn't explain, the error term U. Our task is to estimate beta 0 and beta 1, which define the regression line. Estimets in statistical analysis are usually denoted by growing this kind of carrot or hat symbol over the beta. So this is beta hat 0, this is beta hat 1, so they are estimates of these population regression models. Hat denotes that we don't know a value, but we have calculated the value from a sample and that serves as an estimate for what is the relationship in the population. Now we need to have a rule to set the line. So we have drawn the regression line here and the regression line should go through the middle of the observations, so that they are about the same amount of observations about the line and below the line, and the observations also are assumed to be normally distributed around the line, so that most observations are clustered closer around the line and some are further from the line. Telling a person to draw a line in the middle is easy and a person will probably draw a line like that, but you can't tell a computer to draw a line in the middle because in the middle is not well defined. You have to have a specific rule on how to draw the line and that specific rule of estimation is called an estimator. So estimator is any rule or strategy or principle that we can apply to calculate values for the quantities of interest from a sample data. So let's take a look at what are some properties of good estimators. We covered this in another video before, but let's revise. So we need to have estimates that are consistent. Consistency means that when we have the full population data, our estimates beta hat zero and beta hat one are equal to beta zero and beta one. So these estimates will be the population values. In other words, if we have the full data, a consistent estimator gives us the correct result for that population. Then we have another important, which is unbiasedness, another important property. Unbiasedness means that even if we don't have the full data set or large sample, our estimates will be correct on average if we repeat the study over and over. So the estimates are correct on average is unbiasedness. Then we have efficiency, which means that the estimates are more precise or more accurate than any possible alternative estimator. So efficiency is a property that we can use to compare to estimators that are unbiased. Finally, the repeated estimates from repeated samples should be normally distributed or at least follow a known distribution. That is important for statistical inference or calculating the p-values. One really good rule for estimating the regression model, actually the best rule is to use the residuals. When we have a regression line here, we can see that the observations are not exactly on the line. Instead, they are somewhere around the line and the line is the perfect prediction. So this is the amount of income that would be predicted based on your education level. Then the difference between the actual income and the predicted income by the model is called residual. So that is the part that the model doesn't explain of the dependent variable. So we can calculate this regression line by plugging in our estimate for beta 0 and beta 1 multiplied by education. So that gives the line and then whatever remains, what's the difference between the line and the observation is the residual. So the great or best rule for estimating this regression model is to set the line so that the sum of these residuals raised to the second power is as small as possible. So how we do it in practice, we set the line somewhere. We calculate residuals for each observation. We raise the residual to the second power, we take a sum and then we try different values for the betas to make the sum of squared residuals as small as possible. This is called the ordinary least squares estimator and it has been proven to be consistent, unbiased and efficient. And it produces normally distributed estimates under some assumptions that we will cover later.