 Here's a brief explanation of the chi-squared statistic for nominal data. Presumably you observe 100 people to see who deposits garbage in the can and who litters, and you want to see if there's a difference based on gender. This is categorical data, or nominal data, because a person can fall into one of four categories, a male who deposits the garbage, a male who litters, a female who deposits the garbage, and a female who litters. After you do your observation, you find the following data. So given this data, is there a significant difference in littering behavior between men and women? To answer this question, you have to figure out what numbers you might expect if everything were left to chance, that is, if the null hypothesis were true, that there is no difference based on gender. In that case, you would expect the responses to be equally distributed through all the possibilities. So since 60 people deposited their garbage, that's the total for this column, and 25% of them were female, you'd expect, by pure chance, 15 or 25% of 60 females to be in this value here in the upper left cell. Again, that's if there's an equal distribution with no effect of gender. Similarly, because 40 people did litter, and 75% of them were male, you'd expect 30, 75% of 40, to be the value in this lower right cell if there's no gender effect. Working in a similar method, you can fill in all the expected values. The further these observed values are from the expected values, the more likely that there really is a significant difference, that there really is an effect of gender. So now we have to have a formula for computing the chi-squared statistic, and the way we'll do this is for every cell, we'll take the observed number and subtract the expected number, square that, and divide by the expected number. So it's the sum over all the cells of the observed minus expected squared divided by the expected value. In this case, it works out to 18 minus 15, or 3 squared divided by 15, plus 7 minus 10 squared divided by the expected value of 10, plus 42 minus 45 squared divided by 45, plus 33 minus 30 squared divided by 30. And if we add all those up, we get a total of 2.0 for chi-squared. If you look up 2.0 in the chi-squared table with one degree of freedom, you'll find that the probability of this result is 0.16, so you retain the null hypothesis. There's no significant difference in littering or garbage depositing behavior based on gender. We'll talk a little bit later about how that degrees of freedom was calculated. Here's another experiment. In this experiment, you publish flyers in three different colors, and see how many people take them or don't take them. Is there a significant effect based on color? And here the table shows the observed values. Notice that you don't have an equal number of flyers of each color, but that doesn't matter. You can still calculate chi-squared. To find the expected value of a cell, you multiply the row total, in this case 90, times the column total, which is 40, and divide by the grand total, in this case 150. So for the upper left cell, you get an expected value of 24. The next cell in the top row has an expected value of 90, the row total, times 60, its column total, divided by 150, the grand total, which works out to 36. Proceeding in this fashion, you can fill in all the remaining expected values. You could also just use subtraction, since you know what the totals have to be. For example, the first row adds up to 90, and we already have calculated 24 plus 36, which comes out to 60. Therefore, we know that this last cell must have a 30 in it. In fact, you could compute all the rest of the values after you fill in those first two items. And that means that this design has two degrees of freedom. In general, the number of degrees of freedom for chi squared is the number of rows minus one, times the number of columns minus one. And in this case, that's one times two, or two degrees of freedom. If we apply the formula for chi squared, again going over every cell, taking the observed minus the expected and squaring it and dividing by the expected value, the result is 15.28. And when you look that up in the table with two degrees of freedom, the probability is less than one in a thousand.