 Given a table of categorical data, the statistical question is, are the results random, or is there some underlying order? The Chi-square statistic describes how unusual the results are. To decide whether we just got lucky or not, we need to introduce one more factor, the degrees of freedom. Suppose we knew the row and column totals. The values in the table can't be assigned completely at random, because if they were, our totals would no longer be correct. In this case, a single value anywhere in the table determines all values everywhere. So there is one degree of freedom. In other words, we only need to know one value in the body of the table to be able to find all the values in the rest of the table. The degrees of freedom, usually abbreviated D-F, depends on the size of our contingency table. Suppose our contingency table has n rows and m columns. For example, we might have two treatment options and two possible outcomes. Or maybe our data has n equals 4 college majors and m equals 3 income categories. And so how many cells must be specified before all the rest are known? To find the degrees of freedom, we have to invoke a complicated formula. Well, it's just columns minus 1 times rows minus 1, where the columns and rows have the number of categories. And it's important to remember when computing the degrees of freedom, don't count the total column or the row if they're included. So suppose we collect data on average annual income against different college majors. How many degrees of freedom does the table have? So remember when computing the degrees of freedom, we don't count the total column or, if there is one, the total row. So we'll ignore the total column and the total row. So that means there's three columns and four rows. So the degrees of freedom is, so given a contingency table, we can apply the chi-squared test of independence by computing the chi-squared statistic, computing the degrees of freedom, and then finding the p-value based on the chi-squared and degrees of freedom. So for the data we've been looking at between OS and PN, we computed the chi-squared statistic, there's two rows and two columns, so there's one degree of freedom, and using a convenient device we can calculate the p-value for this chi-squared statistic and degree of freedom one, which will be, or about one in eight. So remember the p-value is the probability of obtaining a result that is at least as extreme as the observed result, even if there is no difference between the two treatments. So even if there was no actual difference in the success rates, we'd see this apparent difference about one time in eight. Now we still need to make a decision. So remember the decision of whether or not to reject the null hypothesis must always take into consideration the consequences of making the wrong decision. So it would help to know what those consequences are. So some additional factors to consider, OS is slightly more expensive and it typically requires a longer hospital stay. So remember we can either reject the null hypothesis or fail to reject the null hypothesis, and the reality is PN is the same as OS or PN is not the same. And again, if we reject the null hypothesis and PN is not the same as OS, that's the correct decision, and if we fail to reject the null hypothesis and PN is the same as OS, that's also correct. If we incorrectly reject the null hypothesis and conclude that PN is better, when in fact there is no difference between the two, patients might be treated with PN, which has the same success rate since the null hypothesis would in fact be true, is less expensive, and requires a shorter hospital stay. Meanwhile, if we incorrectly fail to reject the null hypothesis and conclude there is no difference, when in fact PN has a higher success rate, patients might continue to be treated with OS, which has a lower success rate, since the null hypothesis would in fact be false, is costlier, and requires a longer hospital stay. So in this case it appears that the consequences of incorrectly rejecting the null hypothesis are not very serious, meanwhile the consequences of incorrectly failing to reject the null hypothesis are much more serious. So in this case, taking into account the consequences of a wrong decision, we'd probably be inclined to include that PN is a better treatment option, even with a higher P value.