 The next step we want to consider in our statistics and data science and the choices that we have to make has to do with measures of fit or the correspondence between the data that you have and the model that you create. Now it turns out there's a lot of different ways to measure this and one big question is how close is close enough or how can you see the difference between the model and reality? Well, there's a few really common approaches to this. There's one of us called R-squared. It's got a longer name that's the coefficient of determination. There's a variation adjusted R-squared which takes into consideration the number of variables. Then there's minus 2LL which is based on the likelihood ratio and a couple of variations, the Ikeke information criterion or AIC and the Bayesian information criterion or BIC. And then there's also chi-squared. Now that's actually a Greek C there. It looks like an X but it's a C. And that's chi-squared. And so let's talk about each of these in turn. First off is R-squared. This is the squared multiple correlation or the coefficient of determination. And what it does is it compares the variance of Y. So if you have an outcome variable it looks at the total variance of that. It compares it to the residuals on Y after you've made your prediction. The scores on R-squared range from 0 to 1 and higher is better. The next is minus 2 log likelihood. That's the likelihood ratio or as I just said at the minus 2 log likelihood. And what this does is it compares the fit of nested models. We have a subset and a larger set than the larger set overall. This approach is used a lot in logistic regression when you have a binary outcome and in general smaller values are considered better fit. Now as I mentioned there's some variations of this. I like to think of variations of chocolate. The minus 2 log likelihood. There's the Akeiki information criterion, the AIC and the Bayesian information criterion BIC. And what both of these do is they adjust for the number of predictors. Because obviously if you have a huge number of predictors you're going to get a really good fit. But you're probably going to have what's called overfitting. Where your model is tailored too specifically to the data you currently have and doesn't generalize well. These both attempt to reduce the effect of overfitting. And then there's chi-squared again. It's actually a lower case Greek C. Looks like an X. And chi-square is used for examining the deviations between two data sets. Specifically between the observed data set and the expected values or model you create. We expect this many frequencies in each category. Now I'll just mention like going to the store there's a lot of other choices. But these are some of the most common standards particularly with the R-squared. And I just want to say in some there are many different ways to assess the fit. The correspondence between a model and your data. And the choices affect the model. You know especially are you going to penalize for throwing in too many variables relative to your number of cases? Are you dealing with a quantitative or a binary outcome? Those things all matter. And so the most important thing as always my standing advice is keep your goals in mind and choose a method that seems to fit best with your analytical strategy and the insight you're trying to get from your data.