 Hi there, we're now going to talk about contingency tables and the test for independence. If you were able to handle the goodness of fit test, the good news is it's basically the same process for this test for independence. So a contingency table, it displays a frequency distribution as a table with row and columns to show how two variables may be dependent or related to each other. For instance, you could look at the number of people who own dogs and cats and break it up by males and females, and you could look at those different frequencies to determine if there's a relationship between someone's gender and whether they own your dog or cat. A test of independence tests the null hypothesis that in a contingency table, the row and column variables are independent. Remember, independent means they're not related to each other at all. The hypotheses for a test for independence are as follows. The row and column variables are independent. That would be your null hypothesis, and then the alternative hypothesis would be the row and column variables are dependent. I remember that just means not independent. These are always the hypotheses when you run a test for independence. The requirements to use the method we're about to use is that the data must be randomly sampled. The sample data are represented as frequency counts in a two-way table or a contingency table, and for every cell in the contingency table, the expected frequency, not the observed frequency, is at least five. All right. So notation, big O represents the observed frequency in the cell of a contingency table, big E represents the expected frequency, R represents the number of row categories, and C represents the number of column categories. So to run a test for independence, we have our two hypotheses. The null is that they are independent. The alternative is that the variables for categories are dependent. P values will be provided by technology, Google Sheets in this case, and then critical values can be found from the chi-square table. The degrees of freedom for a test for independence would be the number of row categories minus one times the number of column categories minus one. Test of independence, they are chi-square tests, therefore they are right-tailed. So do a test for independence, go to the chi-square tab in Google Sheets, we'll select independence for our type of test, and then we'll type in our information, our contingency table into the spreadsheet, it'll give us our P value, our test statistic, it'll also give you degrees of freedom. So does it appear that the choice of treatment affects the success of recovery of a broken foot? Use a .05 level of significance to test to claim that success is independent of treatment group. Here you have surgery, weight-bearing cast, non-weight-bearing cast for six weeks, non-weight-bearing cast less than six weeks, and a number of people or a number of successes or recoveries and a number of failures. So our hypotheses are always straightforward, the row and column variables are independent, the row and column variables are dependent. So the null hypothesis basically the translation there is that treatment does not affect recovery, and then the alternative is that treatment does affect recovery. We basically know our claim is the null hypothesis, that's what they said, but we just need to type this information into Google Sheets, get our P value, compare it to alpha and be done with this. So we'll go to Google Sheets, we will go to the chi-square tab, you will change the type of test to independence, please don't forget to do that, it will slightly affect your answer, because degrees of freedom vary based on the type of test. We need to have room for four row categories and two column categories, which is basically success and failure in this case. Alright, so I'm not going to really write out the every single row category, but I'll put surgery, I'll put weight-bearing, I'll put six weeks for the non-weight-bearing cast, and then put less than six weeks for the non-weight-bearing cast. So literally type in your table, starting in cell E2, you'll begin typing your data, 54, 41, 70, 17, and then failures, 12, 51, 3, and 5, and let the spreadsheet do the calculations, please give it some time, it does take a little bit of time to do everything, well at the end of the day, you have your test statistic, 58.39, if they ask for that, there it is, degrees of freedom is 3, and p-value is 0, that's what we need. Alright, so the p-value is 0, what happens when we compare it to alpha, it's definitely less than, so we reject the null hypothesis, we are under the limbo bar, so we reject the null hypothesis, which means we reject our claim. There is evidence to warrant rejection of the claim that success is independent of treatment, so basically what that means is that choice of treatment does affect the success of recovery of a broken foot, so here you have a medical application of statistics, pretty cool right? The only other thing I want to comment on is that if you were asked to find the critical value of a test for independence, the degrees of freedom is always going to be the number of row categories, minus 1 times the number of column categories, minus 1, so that's 4, minus 1, times 2, minus 1, 3 times 1, which is 3, and your spreadsheet told you the degrees of freedom, but I just wanted to show you how to calculate it by hand, because sometimes they'll just tell you what the row and column categories are, it'll tell you what the degrees of freedom are, and they'll tell you to look at the chi-squared table that we use for the goodness of fit test, remember you just look at your significance level and you look at your degrees of freedom in the chi-squared table to find the critical value, critical values for chi-square are always found from the table, use alpha, use degrees of freedom. In this last example, we want to know does it appear that gender affects party voting preference? We'll use a .05 level of significance to test the claim, there's the hypothesis test keyword, that gender is independent of party voting preference, we have our hypotheses that the row and column variables are independent, so gender does not affect party voting preference, that's our claim, and then the row and column variables are dependent, so gender does affect party voting preference. So basically remember what we need to do here is we just need to type this information in the Google Sheets, get that p-value, make a comparison to alpha, and be done with it. So let's do that now. Google Sheets, you only have two row categories, you have male, you have female, and then you have three column categories that you're working with here. So the first one's Republican, Democrat, and then you have independent, then you'll type in your numbers, let me clear out what's already here. So starting in cell E2 we have the males that are Republican, 210, females that are Republican, 240. Make sure you push enter after you enter each data value, 140, 200 for Democrat, and then 45 and 50 for independent. Please give the spreadsheet some time to calculate. Looks like the chi-square test statistic is 2.68 in this case, and the p-value's 0.2612, degrees of freedom is 2.2612. And how does that compare to alpha? It's definitely greater than, so we're not under the limbo bar, we fail to reject the null. We fail to reject our claim. So as a result, there is not sufficient evidence to warrant rejection of the claim that gender is independent party voting preference. It does appear that gender does not affect party voting preference, that's what we can basically conclude here. And remember, if you were to have to find a critical value, you would use alpha equals 0.05, degrees of freedom, row categories minus one times column categories minus one, two minus one times three minus one, which is one times two. You get two, and that's from the spreadsheet too. So two degrees of freedom, alpha equals 0.05, use the chi-square table to find the critical value if they ever ask you to. But anyway, that's all I have for now. I appreciate your time. Thanks for watching.