 Welcome back to the videos for chi-square testing in this lesson. In this video, we're going to go over another way that you might encounter chi-square testing. So in the previous set of videos, we had a deck of cards that was all that had equal proportions across each of the suits. And oftentimes our populations won't have equal proportions across all groups. And so in this video, I'm going to show you how you can do a chi-square test where we have different populations across groups. So here in this code, I'm calling it chi-square test part two. We've got some data on undergraduate enrollment in the College of Earth and Mineral Sciences in the fall of 22. And our goal is to compare the enrollment in the whole College of Earth and Mineral Sciences to the enrollment in one of the courses we taught during the academic year. And no hypothesis is that it's just each proportion based off the total enrollment on EMS. So we can see that each major EBF, Energy Engineering, Environmental Systems, and so on has a different proportion of students. And then our alternative hypothesis is that at least one of these is not as specified. And so we can go ahead and run the code that I've already got set up, which is essentially to read in the first week survey data. We can see that we've extracted just the majors from that survey. And then we can start by figuring out the count. So before we were doing cards, so we already knew the count right off the bat. But in this case, we need to actually extract the count. So we use this dot value counts command to do that. And we can see that we've got some other data. And we've got this energy engineering other data that we that isn't going to work with our existing data set because we don't have any proportion specified for the other or for dual majors. And so we can go ahead and update this first. We're going to account for this one dual major here. We still want them to count towards the energy engineering count, but we don't need their other dual majors so we can say dot replace. And so essentially we're just going to replace comma other with nothing empty quote open close quotation, nothing there. So we can do that. And then we also need to remove these other majors. We don't know what major they are. So we can't count them towards any of the existing majors. So we can just say that we want only the majors that are not equal to other. So that ran, and then we can recount the data this time storing it in counts variable. So we can say majors needs to be plural there dot valued counts. Go ahead and print the counts. So here we can see how our counts actually work. Now that we've removed the other and added that dual major into energy engineering. So this is our observed counts. The next step in our high script procedure is develop the existing or the expected counts. So counts exp is just counts dot some so this is our sample size of our observed data times the array of proportions. Now before our array of proportions was just four times point two five. But now that our proportions up here are different for each category. We now need to manually import each of those. And a very critical point here. The order has to be the same as the order up here. Because we ultimately are going to be comparing these two values. And so we need to make sure that the order is the same across both observations and expected count. And so I've listed the order here that is based off of this order. So we can do point 493 repeat PNG points 172 0.163 and 0.04. And these values also match what we have up here in our null hypothesis. So we can run that we get our expected counts and then we can run the chi square test. And for this purpose, I'm just going to do the one liner. So we already know how to do the randomization procedure. But in this case, we'll just do the one liner. So we say stats dot chi square, our observation we called counts, our expected values we called counts exp. And then we can print P value, which is just results dot P value. And so then we can see the P value is 0.017. And so we can write our conclusion. P value is less than 0.05 our significance level. So we reject the null hypothesis in favor of the alternative that at least one of the proportions is not as specified. However, this chi square test doesn't tell us which one is different. It just tells us that at least one, at least one is different. And so in order to actually get an idea about which one is different, we can just subtract our expected counts and our observed counts. And so here we can see that the most extreme difference is P, G, N, E, which had 11 more actual attendance than expected. Since the value is negative, that means observation is greater than expected. Meanwhile, the environmental systems group had nearly six less than expected. And so we can then start to make some have a discussion about which of these majors are primarily responsible for the rejection of our null hypothesis.