 The R squared value evaluates how well the line of base feed models are actually data. We first need to know how to calculate the sample mean. The sample mean is the average of Y values, also known as Y bar. It can be calculated using two methods. For the first method, it can be calculated using the sample average formula, 1 divided by n times the sum of Y i, where n equals to the number of observations in the sample. Also known as the sample size. In this case, n equals to 10, and we can apply this formula to the values in the table. We can also use regression results to find the sample average caffeine value. First, we need to determine the sample mean of chocolates, the dependent variable. We can still use the sample average formula, which gives us 5.5 pieces of chocolates. Then use the equation beta nought equals average of Y values, minus beta 1 times average of X values. It can be rearranged to find the sample average of Y values, which has a symbol of a line on top of Y, also known as Y bar. With beta nought is the intercept term, and beta 1 is the coefficient term derived from our Excel regression analysis. Now that we have found our sample average, to understand R squared, we first need to know about three values we can take from our graph. The R squared value equals to regression sum of squares divided by total sum of squares. It also equals to 1 minus residual or error sum of squares divided by total sum of squares. It's really useful to know about the three concepts of the dependent variables variation, but you don't need to remember these to get the R squared value. It does help to know that it's no magic. Firstly, we have total sum of squares. It is the total variation in the dependent variable Y, illustrated by the light blue line cell graph. This is the sum of the square difference between every individual blue dots, Y value, and the average Y value, which is the average caffeine content of our sample. Secondly, we have regression sum of squares. It is the variation in Y that can be explained by the predicted Y values. It is illustrated by the yellow lines in our graph. This is the sum of the square difference between the predicted caffeine content, Y value of the orange dots, and the average caffeine content of our sample, or the purple line. Finally, we have residual or error sum of squares. It is the variation in Y that cannot be explained by the linear regression line, the red line. The difference is illustrated by the green lines in our graph. This is the sum of the square difference between the actual caffeine content, Y value of the blue dots, and the predicted caffeine content, orange dots on the red line. The R squared value equals to regression sum of squares divided by total sum of squares. It also equals to 1 minus residual or error sum of squares divided by total sum of squares. Our R squared value of 1 means that our model is capable of predicting all of the variabilities around the sample mean. R squared of 0 means that our model can do no better than the sample mean in predicting the actual results. Let's have another look at the calculations that go into the R squared to see that. When the residual or error sum of squares decreases, regression sum of squares increases, thus R squared increases. This means that our regression model takes into account most of the variations in the dependent variable, which is a good sign. On the other hand, when the residual or error sum of squares increases, meaning we have more random errors, regression sum of squares decreases, thus R squared decreases. This means that our regression model or the line of best fit cannot explain most of the variations in the dependent variable, so we may not be able to get an accurate prediction of the Y values by using our model. We can see from the Excel regression analysis that R squared is higher for regression with low errors compared to the one with high errors. This kind of mathematics is called statistics, and it is used in all areas of science, and in fact, all areas of research where data is analyzed scientifically. Statistics can help us model science, understand the data, and predict future events. They can be very helpful to us, but do have limitations.