 A common situation is trying to decide which of two approaches is better. For example, consider two different surgical interventions for kidney stones. OS, open surgery, and PN, percutate the other one. We can collect data on the success rates of the different surgeries, and it appears that PN is better. But is it? So note that our data is categorical. A surgery is either successful or it isn't. This is also called nominal data. We might approach it as follows. We know the total number. We know the total successes and the total number of failures. So if there was no difference between the treatments, we could predict how many should be in each category. So let's expand our table to include an expected number of successes and failures. We note that there are 273 plus 289 562 successes out of 700 surgeries. So the overall success rate is 562 700s. There are also 77 plus 61 138 failures out of 700. So the overall failure rate is 138 700s. So what would happen if both surgeries had the same success rate? Since the overall success rate is 562 700s, then 562 700s of the 350 OS treatments would be successful. That's 281 successes. And since the overall failure rate is 138 700s, then 138 700s of the 350 OS treatments would be failures. That's 69 failures. And likewise there would be 281 successes and 69 failures among the PN treatments. And so the important question is, did we get lucky and is the observed data a result of chance or is there some other cause? To answer that we need a way to measure how unusual the observed data is. And for that we'll introduce what's known as the chi-square statistic. So the Greek letter chi looks like an x and the chi-square statistic measures the difference between the observed O and expected E values in each cell of our contingency table. And we find that as follows. For each cell in our table we find O minus E, the difference between the observed and the expected numbers, square it to get O minus E squared, divide by E, the expected numbers, and sum over all the cells to get chi-squared. One way you can think about this concisely is that chi-squared is the sum of the square differences between the observed and expected numbers divided by the expected number. So we can find the chi-squared statistic for the data, so we have our data and the expected numbers, and for each cell we'll find the difference between the observed and the expected numbers, square it, divide by the expected numbers, and sum. So for our first cell the observed is 273, the expected is 281, so we'll find the difference, square it, and divide by 281. For the second cell we observe 77, we expected 69, we'll find the difference, square it, and divide by 69. And similarly for the third cell, and the fourth cell, we'll add everything up and get our chi-squared value. We could have a larger contingency table, so let's say we collect data on a person's political preferences and their highest education level, and remember we need to have the expected numbers, and it helps if we have the row and column totals. So let's add those up, and we'll compute the expected numbers. So we might begin as follows, 52 of the 136 are Republican, then of the 7 who have middle school or less, we'd expect to see 52136 times 7, or about 2.68 would be Republican. So that we could have also computed 7 of the 136 have a middle school education or less, so of the 52 Republicans, 7136 of 52 would also be 2.68. So it doesn't matter which way we fill the table, as long as we use the right proportions. So of the 58 with some high school, 52136, or about 22.18 would be Republican. And 52136 of the 71 with some college would also be Republican. Similarly 51 of 136 are Democrat, so we'd see 51136 of those who finished middle school, high school, or college to be Democrats. 53 of the 136 are Independents, so again we could compute the expected number who would have completed middle school, high school, or college. So remember the Chi-squared statistic is the sum of the squared differences between the observed and the expected numbers divided by the expected numbers. So for the middle school or less, we observe five Republicans and expect to see 2.68, so we'll find the squared difference divided by 2.68. We observe one Democrat, we expect 2.63, so we find the squared difference divided by 2.63, and one Independent observed minus 1.7 expected squared difference divided by the expected number, and we can perform similar computations for the high school and college groups. And adding everything together, got those Chi-squared statistic.