 Regressor analysis will give you estimates of regression coefficients and statistical tests of whether those coefficients are different from zero in the populace. Sometimes, however, it is very useful to be able to test other hypotheses, for example if a coefficient differs from a value other than zero or if two coefficients are the same in the populace. To do that we need to understand how we test linear hypotheses after regression analysis. So let's take an example of a regression on prestige on education women and type of the occupation using the prestige data that we have been using before. So we get some regression estimates and we will be focusing on these dummy variables. So the effects of professional and white color here tell what is the difference between or expected difference between professional occupations and blue color occupations and white color occupations and blue color occupations. So the regression coefficients here are differences related to a reference category, which is the blue color. However, sometimes knowing the difference between the categories and a reference category is not enough. What if we wanted to know what's the difference between professional and white color and is that statistically significant. The difference between professional and white color occupations is simply the sum of these two estimates. So it's about ten, but is that different statistically significant. So we need to get a p-value. We can see that the p-value for professionals is about minus 0.08 for an estimate of seven. And based on that, considering that the difference between professionals and blue color occupations is ten, we could conclude that maybe the difference of ten is significant when difference of seven is close to significant. However, we need to do a proper test to assess whether that's the case. To do that, we use the world test. And here, the world test, the null hypothesis that I have in mind is that the type professionals coefficient is the same as the type white color. To calculate the world test, we have to take an estimate squared, divide it by standard error squared. So how do we do that? We have to define what is the estimate here and what is the standard error here. To define the estimate, we will now write the null hypothesis in slightly different way. So we'll write it that way. So if type professional equals type white color, then type professional minus type white color equals zero. So we have something here that we compare against zero in the purple ace. So this is our estimate. What is the estimated difference of type professionals, type white colors, and then we raise it to second power. So that's easy enough. How about the standard error squared? We have to understand what does the standard error quantify. So the standard error quantifies, it's an estimate of the standard deviation of this estimate if we repeat the sample, the same random sample over and over from the same purple ace. So how much this estimate varies because of sampling fluctuations. In our case, the standard error squared is the estimated standard deviation squared and standard deviation squared is the same as variance. So we have estimate squared time divided by the variance of the estimate. So how do we calculate the variance of the estimate now? We have the estimate, which is the type professional minus type white color. We can plug in these numbers. We get about minus 10 and we raise it to our second power. We get about 100 and then we divide it by the variance of that estimate. But how do we do that? We need this kind of equation. So that's the estimate. That's easy enough. And when we have the difference between two variables, type professional and type white color that both vary, then the variance of this difference is the variance of both variables summed minus two times the covariance between these two variables. You can check the covariance calculation rule in this Wikipedia link, or your favorite Ferguson book, if it's a good book, will also explain how covariances are calculated. So we know the type professional variation and type white color variation, those are the standard errors. But what's this term here? These are covariance between the estimates. We can think of that covariance of an estimate of these two estimates as what will happen if the blue color occupations that we use as reference category. What if the prestige of those is a bit lower? So if the blue color occupations prestige is a bit lower, it means that both type professional and type white colors, which are evaluated against the blue color's prestige-ness, both increase a bit. So when these two estimates vary over repeated samples, then they will also covarie. So they will be correlated in repeated samples most of the time. The variance covariance matrix of the estimates is something that the regression analysis will provide for you. And here is the covariance matrix for the estimates, for our example. So the square root of this variance here is the standard error. You can verify with your hand calculator and the square root of this variance here is the standard error for type white colors. And here is the covariance between these two estimates. So this is something that the regression analysis provides for you. You don't have to understand how it's calculated. Then we take the numbers here, we plug them here to this equation and we get an answer of 12.325. We compare that 12.325 against the chi-square distribution with one degree of freedom, or we compare them as the proper f-distribution because this is a regression analysis and we know the regression analysis, how it behaves in small samples. If we didn't, we would use the chi-square distribution. So whether you use the f-distribution or the chi-square to compare this against depends on the same consideration as whether you would be using z-test or t-test. If you are using statistics that have only been proven in large samples, then you use z-test and chi-square. If you use statistics that we know how they behave in small samples, then you use a t-test and an f-test. So you don't, but you don't have to check that from your statistics book because your computer software will do all this calculation for you. So in R, we can just use linear hypothesis and then we specify the hypothesis here. The R will calculate the test statistic for you, 12.325, which is the same we got here manually, and it will give you the proper p-value against the proper f-distribution. So this is highly significant difference. This kind of comparison is not only restricted in comparing two categories of a categorical variable. You can also do comparisons of work, for example, whether they are effects of women and education are the same. Or whether the effects of education is different from, let's say, five. But comparing two regression coefficients comes with a big caveat. It only makes sense if those two regression coefficients are quantified to variables that are somehow comparable. So you can't really compare a number of years of education to share of women. So those are incomparable. In many cases, these kind of comparisons don't make much sense. Here, because we have a categorical variable with different categories, they are comparable. So these are categories of the same variable. It makes sense to compare. In some other scenarios, it doesn't. So you really have to think, does the comparison make sense before you can do this kind of statistical test? Because your statistical software will do any tests for you. It will not tell you whether the test makes sense. You have to think for yourself.