 Hello and welcome. In this video we'll talk more about regression, which is using what is called a regression equation to create predictions based on some sort of x-value or explanatory variable. So first the requirements to use the method we're about to use is that the sample of paired data must be a simple random sample. The scatter plot must confirm a straight line pattern and any outliers are removed and consider the effects of any outliers that are not known errors. So with regard to linear regression, the line of best fit is a straight line that appears to model the trend of data on a scatter plot. The regression equation describes that regression line, that line of best fit, and the equation of that regression line is as follows. It's instead of y equals, it's y hat equals because it's used to predict y values. That's why there's a hat on it. It's a predictor is equal to sum number a plus b times some value x. So a and b will be obtained from our Google Sheets document. x and y hat will stay as they are. You plug in a value for x, out comes a value for y. You input an x value, the output is a prediction of a y value. The equation of a straight line, y equals mx plus b, if you're familiar with that, is very similar to the linear regression line equation, the line of best fits equation, y hat equals a plus bx. a is referred to as the y intercept and b is referred to as the slope in our linear regression equation. So basically, remember we use sample information to predict population information. The y intercept of a regression equation of the true regression equation is represented by beta naught. We have to estimate it using the value of a. The slope of the regression equation for the population is beta one. But we must use the sample statistic b to estimate it. So that's why the estimation equation is y hat equals a plus bx. We use a to predict beta naught or beta zero and we use b to predict beta one. So how are we going to find this regression equation? Well, we're literally going to go to the regressions tab in Google Sheets. We'll type in our data in the columns a and b and the value for a and b, the values for our regression equation will be found in e7 and e8. They'll be found in column e. So we would like to use the explanatory variable x chuprint length to predict the response variable y, which is height. All they want us to do here is to find the regression equation. Remember that's y hat equals a plus bx. We have to look at Google Sheets to find a and b. Plug them in. Job's done. So we go to the Google Sheets document. We go to the regressions tab. We type out any data that is currently in column a and b. And starting in cell a to you type your x values. So we will type all five x values in and column b starting in cell b to we type in our y values are heights. Notice we have a value of a. It's about 125 and we have a value of b of 1.73. We'll round the intercept a value to the nearest whole number. We'll round the the slope or b to two decimal places. So literally our equation is going to be y hat equals a is 125 plus b is 1.73 and then you have your x. Remember eventually if we do have linear correlation, we'll plug in a value for x. Do the calculation to find the predicted value for y. Coming soon. To a video near you. All right 9.4 We would like to use the explanatory variable x which is altitude in thousands of feet to predict the response variable y temperature. So altitude to predict temperature. Let's find the regression equation. So remember y hat equals a plus bx. Let's go to the Google Sheets and let's see what's going on. Let's type in the altitudes and the temperatures. Altitude. We'll just type it starting in cell a2. Remember always push enter after you enter each value. Don't use the down key. Otherwise it won't register the number. All right, definitely make sure you type in the correct numbers. It'd be a shame to miss a question just because you mistyped one number. You can also copy and paste data from your homework into an Excel spreadsheet and then move it over into this Google Sheets document as well. All right, so the only thing we're worried about currently is the value of a and b. So a has a value of We'll go with 72.5 Or you could just say 72 whatever you want and then b has a value of negative 3.68. All right, so we are going to go check that out now. Put it into our equation. So we have y hat equals. So I'll go ahead and use 72.5 for a and then negative 3.68 for b followed by x. If linear correlation holds, we'll be able to use this equation to make predictions. We can predict temperature based on input x altitude. So we can use the regression equation for predictions only if the graph of the regression line on the scatter plot confirms that the regression line fits the points reasonably well and that the hypothesis test results in linear correlation. So we have to look at the graph of the regression line on the scatter plot to make sure it fits the points reasonably well, and then we also need to make sure we check that there is linear correlation based on the hypothesis test. That's the big thing. We have to make sure that hypothesis test shows that there is linear correlation. If the regression equation does not appear to be useful for making predictions, meaning any of the above conditions are not held, the best predicted value of any input value of any variable will be the sample mean of the y values. So the best prediction if you don't have linear correlation is the average of the y values, y bar. This gets a lot of people. It doesn't really make too terribly much sense, but if that regression equation does not hold, it's useless. You have to use the average of the y values as your best prediction, because obviously we don't have time to learn about other methods and other types of correlation or other types of regression. So let's look at the five pairs of shoe print links and heights to predict the height of a person with the shoe print length of 29 centimeters. So x equals 29. Now don't go into robot mode and immediately start plugging in 29 for x into the regression equation. You need to wait. Like when you're at a crosswalk. Sometimes it'll say wait. Is there linear correlation based on the hypothesis test? Was there? So in the previous video, we actually ran the hypothesis test for the shoe print links. And we discovered, we discovered now that and concluded that there was no linear correlation. You could retrieve the data from the example if you want. But on the previous video we did conclude for the shoe print length example that there was no linear correlation. So regardless, the best estimate when you have a shoe print length of 29 centimeters is equal to the average of the y values, y bar. The average of the y values, y bar. That's the best prediction because we did not have linear correlation. So basically that's taking your y values, your heights from the example. The data is available for you in example 9.3 which is in this video. You take 175.3 plus 177.8 plus 185.4 plus 175.3 plus 172.7 and you divide by five because there's five values. And the best prediction is, well, that average is 177.3. But you would use the average of the y values because there was no linear correlation. Now consider the data from example 9.4. At 6,327 feet or 6.327,000 feet, a temperature was recorded. Find the best predicted temperature at that altitude. How does the result compare to the actual recorded value of 48 degrees Fahrenheit? So before you start plugging and chugging for whatever value of x you're looking at which in this case is the 6.327, you need to run the test for linear correlation. So we have to run that test. The null hypothesis for a linear correlation test is that there is no linear correlation. And then the alternative hypothesis is always that there is linear correlation. Alright, let's find out which of these we can go with. So I'll do two approaches. I will do the p-value alpha comparison approach. And then I will do the correlation coefficient critical value approach. So we need to take the data and we need to type it into Google Sheets into the regression tab. I'll go ahead and do that right now. It is actually already there for us. You have three, four, five, six, seven pairs of data here. The only thing you care about is that p-value which is basically zero and then that correlation coefficient which is basically negative one. Alright, so the p-value is zero, correlation coefficient is negative one. Now when you compare that p-value to the significance level which is always 0.05 for this test, it's clearly less than. So that means we're under the limbo bar, we reject the null hypothesis. The critical value approach should give you the same exact thing. So critical value approach, we take the absolute value of the correlation coefficient. What's the absolute value of negative one? It's just positive one. And we compare it to the critical value. Well, the critical value is found from a table which is in the previous video we talked about how to find critical values. The degrees of freedom will be the number of pairs of data minus two. In this situation we have seven pairs of data minus two which is five. So you'll go to your table and you'll look for the critical value for five degrees of freedom. And that's actually going to be, we'll keep the positive version, 0.754. And the critical value is the exact opposite. If the correlation coefficient is greater than the critical value, then that means we reject the null hypothesis. So I did both approaches, p-value, alpha approach, and then the critical value correlation coefficient approach. They both give you the same result. Regardless, we reject the null hypothesis. So we take that big rejected stamp and we're like, bam, no linear correlation is no longer a factor. All eyes are now pointing to the alternative hypothesis that indeed there is linear correlation. Which basically means we can use the equation for predictions. So that means we can go to the equation and we can let x equal 6.327. So now you can go into robot mode and plug in chug. So the predicted temperature or y value is 72.5 minus 3.68 times 6.327. So the predicted y value is actually going to end up being 72.5 minus 23.28. Almost there. We're so close. 49.22 degrees Fahrenheit. That's what the prediction is. How does that compare to the actual recorded value of 48? Hey, I say that's very close to 48 degrees, right? That's pretty good for prediction. So I like it. So anyway, that's how you use the regression equation to predict y values based on whatever your input value x may be. Thanks for watching.