 Welcome to another session of your basic statistics and literacies. Like I said, please make sure that you complete the register. I've just pasted the link on the chat. Today we're going to be looking at chi-square test for independence. We're only going to concentrate on chi-square test for independence for this session. Then following two sessions, we will have the linear regression and then we'll do some activities in terms of how to answer questions and use a little bit of revision to just to consolidate everything that we have been doing since from the first semester until now. And then after this session, then we will have to schedule individual module specific sessions for exam preparations. Are there any questions or comments before we start with the session? Morning Cecilia, how are you? Good, justice. For now, I don't have questions. Then we can continue if there are no questions. So we're going to look at chi-square test. And for you to be able to do this, you need to have your statistical tables because we do need to use the critical values. So you will need your statistical tables. You need to know your formulas. We will be using at least two formulas in this session. So you need to be able to know how to use those formulas to calculate the values of statistics and to calculate the values of your expected frequencies. It's the values for expected frequencies and the test statistics. Okay, so let's get to it then. By the end of the session, you should be able to know how and when to use chi-square test for contingency table or what we call for independence. Like with hypothesis testing, with chi-square test, there are four steps that you always need to know and remember to do in order for you to be able to answer questions. Step number one is stating your hypothesis testing statement. So it means you will have to state your null hypothesis and your alternative hypothesis because for you to be able to know what you need to be doing, you need to state that hypothesis for the population. And always remember that this is what the researcher wants to prove and the alternative will be the opposite of that. And always remember that we always use the population parameter but for independence, there are only two cases. So there are two things that you always need to state, dependent and independent. And in your null hypothesis, you should always have independent, your alternative will state dependent because we're doing for dependent. Then step number two, you need to be able to define your decision method. And yeah, we're talking about the critical value. You need to be able to go and find your original projection using the critical value for chi-square. And the critical value for chi-square, it's always going to be a one-sided test based on the distribution of a chi-square because it's a... a positively skewed test. Then step number three, you need to be able to calculate your test statistic. And calculating the test statistic, it means you should have calculated your expected frequencies. And I will show you how to do that so that it's the test statistic which is your chi-square test statistic. Then step number four is for you to make a decision and then conclude. And the decision, you are going to use your critical value and the value of your test statistics to make that decision. And then conclude, basing your conclusion to your hypothesis testing. So how do we state the null hypothesis and the alternative? Because a chi-square test is a chi-square test for independence, we use what we call a contingency table. Therefore, it means you will be given two categorical variables that you need to test whether are they related. It's a test of relationship for two numerical values. You will realize that the next time we do another test for relationship, we will be using the regression, but that will be for numerical values. For chi-square test, it is for two numerical values, so for two categorical values. And then for those two categorical values, we represent them on a contingency table, which is a n by m table, cross-tabulation, which will have the number of rows and the number of columns. How we state the null hypothesis? We always use independent. We will state that the two categorical variables are independent and your alternative will state that the two categorical variables are dependent. It's very, very important. And what does that mean? Independent means the two categorical variables have no relationship between them. The dependent, it means they have a relationship between them. But when you state your null hypothesis and your alternative hypothesis, we say it's independent. Always remember that. To calculate the test statistic, we use this formula, but I want to show you the cross-tabulation. So this is a table, a contingency table of class standing by the number of meals per week. If this was a question sent to all students at this university or this college and they were asked to choose how many number of meals per week do they have at the canteen in order for us to make decisions about whether to continue to offer more meals in the canteen. So we need to understand how the three classes or the two categorical variables relate to one another as well. So this is your observed values because this will be the values inside the table. That will be the values that you would have observed from the questionnaire and you calculate the total. Sometimes you don't get these totals. You just need to make sure that on your contingency table you calculate the totals and then calculate the grand total, which is very important. And based on the brown area, which is the inside area, this we call them the observed frequencies. In order for us to calculate the test statistic, we need to calculate the expected frequencies, which means we need to use your, for example, we need to use your total for the observed frequencies. So you will see in the next slide, I will show you how to calculate this expected frequency. So with an expected frequency of this 24, we need to calculate it using the total for the row and the total for the column and the grand total, and I'm going to show you how we do that. And you will take your test statistics will be the sum of your observed minus your expected squared divided by your expected. And because a contingency table is made up of rows and columns, you are able to find the degrees of freedom of those. When we find the critical value, we will use the number of rows. These are rows. This is a row, a row, a row, and these are columns, columns are at the top. So here we have one, two, three columns, one, two, three, four rows. You don't count the totals. So in order for us to find the degrees of freedom, we use the number of rows minus one times the number of columns minus one. How we calculate the expected frequencies? Like I explained, the expected frequency, we calculate them by using the row total times the column total divided by the grand total, which is your end. So going back here, in order for us to calculate their expected frequency for 24, we will take the row total of that, which is 70 multiplied by the column total, which is 70 divided by N, which is the grand total, which is 200. That will give us the expected value. For all of them, you do the same. Let's say we want to calculate for six. You will take the row total, which is 30 multiplied by the column total, which is 42 divided by 200, which is your end. You will have to remember to do that. For you to calculate the expected frequency of all of them, when we do the example, you will see how easy it is to do this. To make a decision, we always use the critical value and the test statistic. So the decision rule states that if your test statistic that you would have calculated using this equation and your critical value. Now your critical value, we didn't do that in detail. Remember your critical value for chi-square. Your critical value for chi-square is given by your alpha and the degrees of freedom. This x-square is chi-square. It's just a chi-square. You don't have to do anything there. If, for example, I said my degrees of freedom is zero, no, sorry. My alpha value is zero comma zero five, and my degrees of freedom based on this table that we have. Look at this. It's got how many number of columns. We said degrees of freedom is number of rows minus one times number of columns minus one. Someone's microphone is not new. Anissa, your mic is not muted. Please make sure that you are muted. So we know there were four rows, three columns. So let's go to our critical value. We'll find it by using the formula. We said four rows minus one times three columns minus one, which gives us three times two, which is six. So our degrees of freedom will be six. And we'll go and find this on the critical values of T table. Oh, sorry, critical values of chi table. And I'm going to show you, I will share my entire screen just now. It's called critical values of chi, and I will get to that when we do an example. But that is how you will find your critical value by using your alpha value and the degrees of freedom. And if your test statistic is greater than your critical value, you're going to reject the null hypothesis. Otherwise, you do not reject the null hypothesis. And that is how we do it for chi independent. I just want to share my entire screen. Just discard this and stop sharing and share my entire screen. I wanted to so that then we can be able to see. Everything that I do, I can toggle between. Okay, so now let's look at an example. So the mean plan selected by 200 students is shown below. And this is almost exactly the same table that we used previously when I was doing the explanation. The class standing of the students is as follows and the number of meals per week as follows. So 24 meals. Freshmen say they prefer 20 meals per week. So far, they prefer soft more. They prefer 22 of them. They prefer 20 meals per week. And 26 of them prefer 10 meals per week. And 12 of them prefer no meals per week. And the total of the soft moans who answered this question, they were 60. For senior students, they prefer 14 meals. 14 of them prefer 20 meals per week. 16 of them prefer 10 meals per week. 10 of them prefers no meals per week. And there were 40 of them who answered this question. And if we look at the number of meals, those who prefer 10 meals per week, they were 88. 32 of them were freshmen, 26 soft moans, 14, they were junior, 16, they were senior. And that's how you read this table just like that. And these are our observed values with the total already calculated, right? So if we look at the hypothesis, yes. Can you go back to the table, please? You say 14, you said we start from? No, they will give you a table like this. This is your observed table. If they ask you what are the frequency for those who prefer, let's say they ask you, what is the frequency of junior who prefer 10 meals per week? You will say they are how many? Junior, who prefer 10 meals per week? There are how many of them? 14. There are 14 of them. Yes. So you will just need to know how to read this table like that. So it got less of whether you start with the meals per week or you start with the class 10 meals. So these are your frequencies, your observed frequencies. To state the null hypothesis. Yes. To state the null hypothesis and the alternative hypothesis, we say the meal plan and class standing based on the two categorical variables, which were meal plan and class standing, we say they are independent. The alternative will state that meal plan and class standing are dependent. Always remember null hypothesis independent, alternative dependent. To calculate the expected value, because we need the expected value. To calculate the expected value for 24, we will say 70 multiplied by 70. We did that before 70 multiplied by 70 divided by 200, which is that value. And that would give us, so we have 70 multiplied by 70 divided by 200. And that gives us 24.5. You can see there, I write it on my expected table. It is 24.5. Now let's do the next one. I will do 22. So with 22, 22, it will be 60 multiplied by 70 divided by 200. It will be 60 as my row total times my 70, which is my column total, divide by 200, which is my grand total. And that gives me 21. As you can see there, it's recorded as 21. For junior 10, for those 10, if we calculate the expected value for 10, it will be 30, which is my row total times my column total of 70 divided by my 200, which is what we calculated here, which is 10.5. And we can do for all of them, and you will calculate all your expected values. And if you add your expected values for all your columns and rows, they should be the same as the total for your observed values. And once we have calculated our expected values, then we can go and calculate our test statistics. Now remember, the equation says it's the sum of your observed minus the expected square to divide by the expected. So what we're going to do is our observed, which is 24 minus our expected, which is 24.5. We need to square this and divide by the expected, which is 24, the corresponding expected. Plus, because of the summation, that summation means adding up. So then it means plus. The next one we can take 32. So it will be 32 minus the expected, which is 30.8 squared divided by 30.8. And plus, and you continue until you add all of them on your table. And up until you get to the last one, which is this 10 at the end. When you get to there, it will be 10 minus 8.4 squared divided by 8.4, which will give you. If you add, once you solve this and you add them, you will get 0.70. Now that is our test statistics to go find the critical value. Remember, our critical value we use if they told us that we must use alpha of 0,05. Then our critical value will be the chi-square of alpha and the degrees of freedom. And our degrees of freedom, we said there are four rows and three columns. It will be 4 minus 1 times 3 minus 1, which will be 3 times 2, which gives us 6 and 0,05. Let's go to the table to go find this critical value. And we go to the table and you will use the critical. Let's make it bigger so that everybody can see. We use the table, table E4, which is critical values of chi. And on the critical values of chi, you will always see that it is a left or sorry, a right or positive skewed distribution, right? Your critical value will always be on your right hand side to make a decision. So anything that falls on the right hand side, it will be rejected if it falls on the right of the critical value on the region of rejection. So also we don't use the top ones. We only use the upper-tail values, which are your alpha values next to the table. We're looking for 0,05. If there is 0,05, we're looking for the degrees of freedom of six where they both meet. That's where we will be. And that is our critical value. Our critical value is 12,592. So since we know that our critical value is 12,952 and is there, that is 12,592, we're going to take our test statistic, which is 0,7. Where does it fall? It will fall in the do not reject area. So we're going to look at it and take our test statistic. Look at it and say it falls in the do not reject area. And therefore, because we know that the rule said if the test statistics is bigger than the critical value reject the null hypothesis. Now we can conclude by saying the test statistic of 0,79 is less than the critical value of 12,592. So we do not reject the null hypothesis. And we can conclude by saying that there is not sufficient evidence that the meal plan and class standing are related. We do the hypothesis testing. Let's look at an example. And here I'm going to introduce a template for ease of use as well. So now with contingency table, there can be many different ways of doing a contingency table. They can be in terms of this one or the previous one that we used this contingency table. We call this contingency table because it's got four rows and three columns. We call it a three, a four by three call a four, a four by three contingency table. If we come to our exercise one, there are one, two, three columns. So there are three rows and one, two columns. So we call this a three by two contingency table. We can also have a three by three contingency table or a two by three contingency table. So we will have two rows, three columns, or we can have three rows, three columns. So now I want to introduce based on what you just learned. I know that it might be a little bit tricky, but you will have to practice. I'm going to introduce the template. I do have a template here. On this template I've created a three by two contingency table, a two by three contingency table, a two by two, and a three by two. They can even be more than that depending on what you need to calculate. I do have also here at the bottom a two by four contingency table. So you can always look at this and look at which one better suits the contingency table that you are working with. So let's go back to our thing. On our one, it's a three by two. So I need to go to my contingency table here and look for a three by two. And there it is. It's on S. So I can scroll, scroll, scroll, scroll up until I get to that. So this is just the table I want to use. Then I can also go to the table. I can black, white, Indian represent those values here. So let's just minimize the contingency table that I have here. Sorry, my Excel sheet. So it's taking me long. Just go to this side. I'm going to represent the same table here. The way I see it, it's white. So you can override the values, white, black, and Indian. What I will suggest you do is when you go to the node section where I've uploaded this, please download it outside and save it on your laptop or somewhere where you are able to use it. You don't use it online because then you're going to override it for everyone. And then the top is male and female. So I don't need all these other values that I have inside the table. I'm going to just delete only the ones inside the table that corresponds the white shaded area. The gray one, you don't touch. You don't do anything to it. Only the white side. So I must just put the very same values that we have here. So I'm going to say this is 40. This is 32. And this is 48. Now you will realize something because we already have a value here on the table that they gave us, which is 70. You need to use your calculator and your math knowledge to say if I have a 70 there, it means 40 plus this number should give me a 70 because this is a total of this row, right? So if you know your math, you will say 70 minus 40 will give you that. And that will be 70. You will see that that 70 will correspond with that one. And on this one, because they have completed it, I can put the 48. And you will see that the table already calculates the total for all the values that you have. And for this one, they gave us 120 there, right? And they gave us 250. All what we know is that if this is 120, right? Then if I know that, let's go back here. If I know that this is 120, I can calculate what this value would be, right? This value will be 250 minus 120. So 250 minus 120 will give me 130. Now, I also have the answer for that one. It was 80. So I know that this is 80. I can find the answer for this one. So what is the answer for that value? That will be 250 minus 80 minus 70, which is 100. So just listen. Yes. Hey, forgive me. I'm a little bit lost. Yeah, the total you say 120. I'm a little bit lost there. 40 plus 32 plus 48 is 120. If you add all of them, they will give you 120. Oh, thank you. So we already also calculated that this one there is 80. So since we have 30 and 48, we can calculate the missing value there. 130 minus 48 minus 80 will give us 52. It will give us 52. So I just need to remove all these values now, because I do have them populated. So that will be 80. That is 80, 52, 130 and 100. So I have my contingency table. I can also come here and replace all the values. And that will look exactly the same as that. So that is my observed frequency table. Step number two, in terms of chi-squared, it says number one, we need to state the null hypothesis and the alternative hypothesis. So null hypothesis will state that race and gender are independent. The alternative will state that race and gender are dependent. That's step number one. Step number two, in terms of this, we need to calculate the expected value so that we can calculate the test statistic. But before we calculate the test statistic, let's assume that this test, we were given alpha of 0,05. So step number two, we know that we need to find the critical value of chi-squared alpha and the degrees of freedom, which will be the number of degrees of freedom is the number of rows, minus one, times the number of columns, minus one. So if we have alpha of 0,05 and our number of rows, let's calculate it here, rows minus one, times columns minus one. How many number of rows do we have? We have one, two, three. Don't count the total. So it's three minus one. How many number of columns do we have? One, two. So that will be two minus one. And that will give us three minus one is two, two minus one is times one, which is equals to two. So our critical value, we go to the table, look for alpha of 0,01 and the degrees of freedom of two, our critical value is 5,991. So we do have 5,991. That is our region of rejection. So step number three, we need to find the expected value. So to calculate the expected value, remember we use row total, times column total, divide by your n, which is the 250. So let's calculate the row total for 40. It will be 70 minus or 70 times 120. So that will be 70 times 120 equals divide by 250. And that gives us 33.6. So we can write it here, 33.6. And we do for 32. For 32, it will be 80 times 120 equals divide by 250 equals, and that is 38.4. It will take you long. So if we go to our table here, what I've done is I've calculated those expected frequencies. You can do the same, copy the labels from this label and paste them there. I could have just used the automated thing. And here is my expected value. It says my row total, if I click on it, you will see there, my row total 70 times my column total, T8 is 120 divided by the pebble, which is 250. If you go to the next one, sorry, you can just go to the next one like that, click on the next one and double click on it. It gives you how it calculated that one. So those are your expected values, right? So which are your expected values? And once you have confirmed all your expected values, we just click on the escape, and this is the table with your expected values. Now with your expected values, what this calculation does, it goes and it calculates your test statistic by doing this calculation here. So what does this calculation do? It is calculating your test statistic, which is the sum of your observed value minus your expected value squared divided by your expected value. So what it does is it takes 40 minus 33.6 squared divided by 33.6 and then it goes and it gets this answer. That is this answer that we have here. So it takes 40 times, oh, sorry, 40 minus 33.6 and it multiplies it again with itself because power is the same as squared. So I could also just use here, instead of that, I could use the power, where is power now? Power of two, it will still give me the same answer. So this is 112. I can also do the same here instead of using multiplying it by itself. I can just say to the power of two and it will still give me the same. So I do 440, 432, 448. You can see there 448, it's 48 minus 48 because the expected value is 48. Also for 52, the expected value is the same. The hands, they are 00 there. And then it adds all of them. If you look at this, this adds all your potions of the summation. It adds all of them and we get our expected yawakai squared. So I just want to remove all of this because I want to use this space. So once we have calculated our test statistic and we find that our test statistic is 4.395. I'm going to say it's 4.4. I'm just going to leave it to two decimals or I can just leave it to all the decimals that we have 4.3956. So our test statistic is 4.3956. That is our test statistic. Now we can go to step number four and make conclusion. So I can draw my picture and I will present my critical value or my region of rejection, which we did find that it was 5.991. And our test statistics falls in they do not reject. Therefore, since our kai squared state of 4.3956 is less than our critical value kai squared alpha of 5.991, we do not reject the now hypothesis and we can conclude by saying that there is a significant and we can conclude the previous one. There is not sufficient evidence to prove that the race and gender are different from another. So that is how you will conclude and do your test statistics. So let's look at the next exercise. Use the contingency table to test it for independence for two variables given in this column at 5% level of significance. Therefore, our alpha is 0, 05. And they have given you the contingency table. How many number of rows? We have 1, 2. So this is a 2 pro by how many number of columns? 1, 2. And this is a 2 by 2 contingency table. So stay. You need to be able to know how to state your null hypothesis and alternative hypothesis. And the question here is, which one of these statement is incorrect? So tell me. The first one says the null hypothesis is the variables are independent. Is that correct or incorrect? That is correct. Alternative, if the first one was correct, therefore it means the second one will also be correct. Number two, they say you need to find the critical value. So since we're doing a 2 by 2, so we can find our critical value by using alpha and the degrees of freedom. And our degrees of freedom is calculated by using R1, R minus 1 times C minus 1. So our R, how many number of rows? The way 2 minus 1, how many number of columns? The way 2 minus 1. 2 minus 1 is 1. 2 minus 1 is 1. So it's 1 multiplied by 1. So our critical value. Our critical value will be 0 comma 0 5. And the degrees of freedom of 1. So you need to go to the table. 0 comma 0 5. Since it's that they gave us and 1. Our critical value is 3 comma 841. 841. That means that is correct. Step number four, the expected frequency for years. They is our yes. And B, they is our B. Therefore the expected frequency for 20 is 25. Is that correct? Let's double check. So how do we find the expected frequency? The expected frequency for years and 25. We use the row total. Multiply by the column total. Divide by n. So our row total is 65. Multiply that by 70. Divide by 140. 45. No, that's incorrect. And that would be incorrect. So the last one they say calculate the test statistic. And remember the test statistic is a long number. We can do that using our template. So we need to find the correct template. And this is our correct template. It's a 2 by 2. So you can use our 2 by 2. We just change this to yes. And that to no. And this to A. And this to B. And just copy that. Yeah. That. And there we have the same table, but not the same values. We still need to populate it with the right values. Just remove the white area. And populate. You will see that everything relating to that calculation disappears as well. So we need 40. And 25. And 35. And 45. And that gives us 145. That's our grand total. And they are our expected values. And depending on how you calculated it, you should have received 31.38. The previous one. The expected frequency for yes, which is 25 on this one. We know that it was incorrect. So now we are calculating the test statistic. And the test statistic is 4.54. Which is the same as what we have here. 4.54. And suppose that the calculated test statistic is 4.54. We know that that is correct. The null hypothesis is rejected. And it will be rejected because your critical value. If your critical value is 3.8. 3.8. 4.1. It's on yeah, your test statistic will fall in there. Reject the null hypothesis area. And therefore that will be correct. So that is one way of answering the question by using the template. The templates are just there to help you with this long calculations that you have. Especially now since you're writing online exams, there is nothing wrong with using this to help you. And also when you are doing your assignment, just see if you are able to use some of this template. Yeah, so let's look at another example. This is a study on the mode of transport that WEC has used from U2WEC and associated with the distance covered by each mode of transport. So you do have the distance and the mode of transport. So let's look at this. How many number of rows? We have one, two, three. Don't count the total. Three rows. How many number of columns? One, two, three. Three columns. So it is a three by three. So we can go to the three by three contingency table. And there it is. It is our first one. I'm just going to make it smaller and minimize it. And we will use this. 10 to 50. I don't have to write it the way I see it there. And then 50. As long as I have the information. Then we have mass. And half. And three. And I can remove all these other values. 15, 21, 22, 13, 27, 17, 23. 19, 26. So I can be able to answer all the questions that they will be asking. Except the theory what? Because here you have your test statistic. You have your expected values. And we can also change this. Change the template so that it automatically changes the title. So there are your, your values. My expected value is on the column. So I should be able to answer the questions. So let's read which one of these statements is incorrect. Number one, the null hypothesis is independent. Is that correct or incorrect? It's correct. That is correct. Then if this is correct, then it means number two is also correct. Number three, the region of rejection. Yeah, they're asking you to find the critical value. The region of rejection to reject the null hypothesis is calculated as the test statistic. Less than your critical value will be 9.488. That's very critical and very confusing. Let's go to number four before we answer number three. Number four says the test statistic is 6.29. That's what we got, right? That is correct. So let's go back to the critical value. We're going to find by using r minus one times c minus one. So your row columns, your row and columns, your row. We said it's a three by three. So this will be three minus one times three minus one. Because here we calculated the degrees of freedom. The degrees of freedom here will be two times two, which is equals to four. And we are told what the critical value is. I'm sorry, our alpha is zero comma zero five. So critical value of zero comma zero five and degrees of freedom of four. We go to the table. Zero comma zero five and four is 9.488. Now, instead of jumping to conclusion there, we need to place our critical value and our region of rejection. Our critical value is nine comma four, eight, eight. And we know that if the region of rejection or the rule says if your chi-square state is greater than your chi-square critical value, we reject the null hypothesis. That's the rule. That is, that is the rule. Now, in terms of this question, it says we reject the null hypothesis if your test statistic is less than your critical value of that. Is that correct? Will that be correct? I'll just give you the rule. That will be incorrect. It's supposed to be greater than for us to reject. It's supposed to be greater than. So that will be the incorrect one. And step number five, it says we can conclude that the mode of transport is independent of the, is independent of the distance traveled by sample. Our test statistic, wait, sorry, let's, this is, this is very confusing. That is why I am very confused with the way they ask this question. So, looking at our test statistic, we did find our test statistic to be 6.29, right? So it will fall in there, do not reject. So if we not rejecting, so therefore it means we're saying it is independent. So that will be the incorrect one because it should be greater than. And we are concluding that we are not rejecting the null hypothesis. Therefore, this will also be correct because we can conclude because we are not rejecting it. We can conclude that they are independent of one another. That's it. Yes. I know that we are above time. I'm just going to recap just now and to recap him with more additional activities. So this is another question. I think they are about, I had about 10 questions. So you can go through these questions on your own as well. So, remember that expected frequencies, you calculate them by using the row totals multiplied by the column totals. And if you are given a table that is not complete, please make sure that you complete it first. Remember 950 is the same as 1000-950 to get the missing value there. For Jordan and acceptable to get it, you will say 950-365. And for the total on this side, it will be 1000-700 will give you the 300 as an answer there. And to get the unacceptable for Peter, you will get the 300-265 will give you the answer there. For John and unacceptable, it will be 700 minus the value you found there. Or it can be the value you found on 950-1000-950, which is 50 minus the value you found there. That would give you the unacceptable John. And then you should be able to answer this question. Observed frequencies are all these frequencies that they gave you originally. Always know that how you state your null hypothesis always has the independent, your alternative has dependent. How to calculate the degrees of freedom? The number of rows minus one times the number of columns minus one, that's how you calculate. The test statistic, please use the template and also select the correct one. So this is a two-by-two contingency. Therefore, you will use a two-by-two contingency table to find that. Otherwise, you just need to use your casquets test that. This equals to the sum of your observed minus your expected squared divided by your expected. So every corresponding expected value and the observed value. So you subtract one from each other and square the top path and divide by the expected value. And you will also require to make a decision. And in order for you to not get confused based on the, if the test statistics is larger than the critical value you reject, you can draw yourself a normal distribution and make it skewed a little bit and say this is your pi squared alpha and the degrees of freedom, which tells you your region of rejection. And anything that falls here, you reject the null hypothesis. And then otherwise, you do not. And also you can answer some of these questions. You can see on this one, they're asking you almost similar things. Always remember, symmetrical means normal. And we always know that a chi-square test, it is a huge distribution test. So you should be able to answer these questions with ease. And here you are asked to calculate the test statistics. So this is a one, two, three, four and one, two. So it's a two by four. A two by four, I don't think I do have a two by four. I have a three by three, three by two, a two by four. And already it is in the template. You should be able to answer this question with ease. Because if you download the template, some of them are already pre-populated with some of these questions. This is a three by two. You should calculate your chi-square test because they ask you to calculate the test statistic. So this is the test statistic. And this is your critical value, your critical value, your test statistic. You just need to do that to answer this question. The last question, or not the last, the second last question. Also, they gave you a two by two table. Answer the question. Almost they look almost exactly the same. You can go wrong with this. And this is a two by four. One, two, three, four, a two. This is a two by four contingency table. And you will find the template on there. You just answer the question. The last question, it is a three, one, two, three. And yeah, notice that they are total missing, right? So always remember to have your total. Or you don't even have to worry about the total because on the template itself, it calculates the total for you. And if you have noticed, this is already part of the template. If I go there, you will see that this is already part of the template because the three by two already had used the template. So answer that. So you can use the template to see how you answer this. So the totals are already calculated on the template. But if you are not using the templates, always remember to calculate your template and answer the question. Right. And that is all that I can offer and share with you for now. If there are no questions, please remember to complete the register. I'll repost it on the chat. It's losing. Yes. Justice. I can't access the chat. Can you send it to WhatsApp as we have done last place? Yeah. Justice, the same link I send on the WhatsApp, you can still access it there. It will never change. Yeah. You can just use the same. Okay. Okay. So enjoy your... What time do we finish? I'm finishing earlier than we supposed to finish. Why am I saying... No, our class ends at 10.30. So we still have 10.30. So let's go and look at other questions. I'm thinking of during the week session that are one hour 30 minutes. Oh, sorry. My bad. We still have more time. Let's go and look at the questions. So we're done with this one, right? So let's do the next one. Okay. So let's look at this. Last question. My bad. So two employees, Peter and John are monitored to determine whether is there any differences in the proportion of acceptance parts produced by the employees. The sample of parts produced is given below. So this is quality. Is it acceptable and acceptable? And the employees who are doing the tests or who are monitoring are Peter and John. Hi. We can complete the whole table before we use our template. Let's do it here. So 950 minus 265. 950 minus 265. 285. It's 685. 685. And 1000 minus 950 is 50. 700 minus 685. There'll be 50. Yes, 700 minus 685 is 50. And this will be 300. 265. 300 minus 265 will be 35, I think. 300 minus 265 is 35. So we can go to our template. Let's use our template. We have a 2 by 2. So we go and look for a 2 by 2 template. And I'm going to say unacceptable and acceptable and unacceptable. This is Peter. And because now I'm lazy to type the whole name and John. And we can almost all this measures. We take the correct lines. You're not a buddy. No, we'll get there. What's that anyway? 265. I'm not going again. And 685. Just as your mic is unmuted. And one five. Okay. So 300, 700, 950, 50. So it means we did the right calculations on the app. Let's answer the questions. So now we can answer. Our questions. So the first one. Which one of the following statement is correct? Number one, is it correct or incorrect? It says the null hypothesis are dependent. Is that correct or incorrect? It's incorrect. It is incorrect. We're looking for the correct statement. The observed value. Observed frequency for acceptable. And John is 665. Is that correct? That's also incorrect. That is incorrect because acceptable and John is 685. So that is incorrect. Let's bring our table with the expected frequency. So remember the expected frequencies. If you are not using the template, you are going to calculate it using the row total times the column total divided by your sample space or your N. The expected frequency for acceptable and Peter is 265. The expected frequency for acceptable and Peter is 285. So this says is 265. They took their observed frequency. So that is incorrect. The degrees of freedom is one. So now you need to make sure that you know your number of rows and your number of columns because this is a two by two contingency table. So your number of rows minus one times the number of columns minus one will be two minus one. Two minus one, which will be one times one, which is one. The degrees of freedom is the one that is correct. Suppose the test statistics is equals to 40. Is our test statistic 40? Our test statistic is 40.1 003. So that would have been also incorrect. Next, let's look at the next question. The next question is a certain media company published for magazines for teenage market. The executive editor of the company would like to know the readership preference of the four magazines is independent of gender. A survey of 200 teenagers was carried out. The following contingency table are obtained. And yeah, as well, there are no totals. So if you calculate in this manually, then remember to calculate your total type, especially if the first question says, what is the expected value? It means you need to be able to go and calculate your expected value. So let's look at the question. Which one of the following statement is incorrect? To calculate the expected value of young, of youth and girls, girls and youth is 12. So to calculate the expected value for that, we need the row totals times the column totals, divided by the grand total there. So we can go to our contingency table. Yeah, it's a two by three table, two by three. So you're looking for a two by three contingency table. And sorry, it's a two by one, two, three, four. It's a two by four, not two by three. Two by four contingency table. And this is our table. And if you look at our table, it looks exactly the same as the table we have. Yeah. Fortunately, now because we need a presentation mode. So girls, boys, beat youth growth life. And the numbers are exactly the same. 18, 38, 12, 26, 20, 34, 28, 24. We have included the right values. Minimize it better than that. So we can leave it as such. Okay. So we calculated at the bottom. We need to also fix to reflect the same titles. And it's the ones at the top. So number one, calculate the expected value of youth and girls. So we know that the row totals of girls is 78. Multiply by the column total of 38. And it will be 78 times 38 divided by 200. Gives us 14.82. Then that is correct. The second question. I'm just going to leave it here. I need to go through all the questions and then I walk. I'm going to give you five minutes to go get some water. Just go through the questions. Okay. Sorry. I wanted to check if the templates that I'm referring to are uploaded so that you can use them. I applaud just for that. So, but if you go there, there is the template that I'm using. During this session. So you can download it and use it as well. So that then we are all looking at the same. The same thing. So which is this template. Okay. Let's minimize that to the presentation. So our question. Let's go back there. Let's go back to our question. We're looking at a two by four. By four contingency table. So let's answer the question. So the first one we established that it is correct because the expected value. The expected value is 14.82. That's my thing so small. The expected value is 14.82. Question number two. The null hypothesis. Gender. And. Magazine preference are independent. Is that correct? That's correct. The alternative. Independent that will be correct. The degrees of freedom. Your number of rows minus one. The number of columns minus one. The number of columns minus one. How many rows? We've got two rows. Minus one. How many columns? And four columns. So it will be one. Multiply by three. Four. Your degrees of freedom is correct as well. And number four. I gave you this answer a long time. It's number four. Is the case quite symmetrical? No, it is not symmetrical. But it is. It's huge. Distribution. Because if you look at this, if you don't know. You can always come to your table and look at that. You can see that this is a skewed distribution. Let's go to. How played the test statistic. If we use. Our template. The test statistic. We found that it was. Equals to. Six point. Eight nine. One six. So that will be. Option number one. So, but you need to practice using the template so that you know. How the templates work. The only thing on the template that you need to put is just. The observed frequencies. The rest, the template will calculate. You need to also be aware that which template you need to be using. Based on the number of rows and the number of columns. What type of a contingency table you are using. So the next couple of questions. I'm not going to guide you through the activity. You will tell me which template we need to use. And then you will answer the questions. I will do everything for you online. But you need to guide me. So let's go. To our next question, which is exercise seven. The first thing you need to tell me is which template to use. Because you can see there, they don't have the total. And they're asking you to calculate the test statistic. And we're going to go and do the critical thing. But in order for us to calculate the test statistic, which is the. And product of. You completing the. The template. Which one. Which contingency table are we using. We need a three by two. We need a three by two. So let's go find the three by two. This is a three by three. This is a two by three. This is a two by two. And the last one is a three by three. So we just going to substitute the values there. ST. Just going to use the abbreviation. And P. And P. And B. And G. We have 20. We have 30. 45. Probably I should have removed all of them. 45. And 15. 30. And 10. So there is our table. And this table of ours is not working. It is working. It is calculating. Sorry, I bet it is calculating correctly. It is working. So let's answer the question. What is the value of your test statistic? Your value of your test statistic. You just going to look at the last. Maybe there. It's zero. So either option four and option five. Are correct because zero and zero. The others are incorrect based on the. The calculations that we did. And we can double check that because if you're not sure. That they are correct. So total year is 165. The total year is 120. If we calculate for 90 it will be 120. Multiply by 165. Divide by our grand total. Is 220. Divide by 220. If we take our calculators and calculate this. It will be 120. Times 165. Equals divide by 220. And equals 90. So it will still give you the same. So I'm not going to go into all of them. Now let's go find. The critical value. High square critical value is given by. High of alpha and the degrees of freedom. And our degrees of freedom. It's given by the number of rows minus one. Times the number of columns minus one. How many rows. And how many columns. Three minus one. Three minus one. And two minus one. Two minus one. And the degrees of freedom would be. Two. Two. So here we need to find. Our alpha they told us that is it. Alpha of. Five percent level of significance. So it's zero comma. Zero five. And two. So let's go. Zero comma zero five. And two. The critical value is five comma. Nine nine one. Five comma nine nine one. Five comma nine nine one. So which one is the correct one. That means this one is incorrect. Only option four is the correct. Answer. Easy, right? So it will be easy to answer questions in the effort. Or in your assignments. I think I'd like using the template is much quicker. Yeah. It is. Because otherwise then you would have to calculate all the expected value and. Say 90 minus zero divided by zero. Which will be zero and. It might take you forever to finalize that. Next. Which contingency table. We need to be using. Do you all have. Did you download the contingency table. It's two of two. It's a two by two. So just go to the two by two contingency table. I do you all have it. Because I also want to give you a chance to play around on the contingency table. And see if you are able to. Use it. I can. Yes, justice. How do I download it because I'm just using a cell phone. I don't have a cell phone. I don't have a cell phone. I don't have a cell phone. I don't have a cell phone. I don't have a cell phone. I'm just using a cell phone. You don't have a laptop. Unless I can request someone. To. For me. His or his. You don't have a you don't have a laptop. No. So it means for you you left to always calculate things. Okay. I didn't take that into consideration justice. But however for you. as well. So maybe if I can buy a laptop then how do I download it? You will go to where you find the notes online. When you go to, when you go to the schedules, probably is the thing for the schedule. Yes. When you go to this schedule thing, you do have the link for notes and recordings because you join the session using this joint session and the notes and recordings. When you click on that, it will take you to an area like this. You just scroll to the bottom where it's numeracy and you look for basic statistical literacies and you click on that. When you click on it, it will, oh sorry, when you click on it, it will go to basic statistical literacies like this. You just open the class notes folder. When you open it, it will open this and then it will go to the notes. Here are all the notes and here is the template. You just download it from here. You can download all of them or download only what you are looking for from here. By just clicking on the three dots, there will be a download. You just click on it and download it. Do not work from here. Just download it and it will be there. Let's go back to our presentation and we go to this. I want those who have this use your template. If you are able to get it, change the titles or whatever it is because mine I've been playing on it now. It looks different to the one that you have. You have different words there. Change them. This will be A and B and this will be D and H because I'm just taking the first by school did not attend high school and did attend high school. I can take D, H did not attend high school and high school and then you just remove all this and day 40, 20 and 70, 80 and then answer the question based on what you see from there because I'm going to go out of here and go help justice to get there. I'll answer the questions. The first thing you need to always know is how to do your state your non-hypothesis and alternative hypothesis. Justice, I want you to read the following on your site. Calculate for me the expected value for 40. So I'm going to say for 40 is the same as 60 multiply. I'm going to do one and then you need to do for all of them. 60 multiplied by 110 divided by 160 and then you need to do for 20. Expected value will be 60 multiply by 50 divided by 160 and then do for 70 which will be 100 multiplied by 110 divided by 160 and then do the last one which is 30 will be I'm just going to put the F so that we know that there are frequencies that will be 100 multiplied by 50 divided by 160. You need to calculate all of them. Once you have calculated that I want you to also do the critical value chi square alpha in degrees of freedom where we calculate degrees of freedom by using the number of columns minus one and the number of sorry number of rows minus one and number of columns minus one. So if I have all these values then I will we will get to the test statistic. There is no test statistic that you need to calculate and then we will answer the question. We said the three things that I want you to calculate and then we will get back to it. I'm going to give you three minutes. Are you done justice? Okay so if we calculate our expected values let me take my calculator to the other side. So the first one for 40 the expected value and also those who are using the template you can confirm. We say it's 60 multiplied by 110 equals divided by 160 and that gives us and that gives us 41.25. If I go to the yes if I go to the template as well you can see that it calculated it as 41.25. The next one if you calculate for the next one which is 60 multiplied by 50 equals divided by 160 it will give you 18.75 and if you go to the template you will see that it will be 18.75. Then we do the these two last ones which they are 100 multiplied by 110 equal divided by 160 and you should get the expected value of 68.75. So if we go to the template you should also get the same 68.75 and the last one is 100 multiplied by 50 equals divided by 160 and that gives us 31.25. If we go to the template you will see that it is 21.25. Those were the things I asked you to calculate. The next one is to find the degrees of freedom. So how many number of rows? There are two minus one. How many number of columns? There are one two columns minus one which means the degrees of freedom is one. Then you should be able to answer any of these questions last year. So let's see if we are able to answer the questions. So I'm going to bring back this contingency table here because most of the answers are here. So let's answer number one. Oh no we didn't find the critical value. So we need to we still need to go find the critical value. Let's find the critical value they gave you level of significance of one percent. So it means we're going to find the kai square critical value by using zero comma zero one and the degrees of freedom of one. So let's go find it zero comma zero one which is the second last color and the degrees of freedom of one which is six comma six three five. You see that six comma six three five and then we come here. So the critical value is six comma six three five. So let's answer the question. So number one we're looking for the correct answer. Is number one correct or incorrect? Number one is I'm just double checking if you guys are here. Maybe I'm alone. Number one is incorrect because it states that the null hypothesis is dependent. We know that the null hypothesis should always state independent. Number two the degrees of yes the degrees of freedom is equals to one. Did we find it? Yes we did calculate it. It's equals to one. Therefore number two is the correct one. Number three it says the critical value is dating point two seven seven. We did find the critical value to be six comma six three five. So that is incorrect. Number four the observed frequency of high school and party evaluation high school and party evaluation B is dating. It says it's 31.5. Therefore it is incorrect. Number five the expected value which we calculated manually and also on the template. We did calculate the expected values here. It says the expected value of did not complete high school and party evaluation party evaluation A. So did not complete high school and party evaluation A is 41.25. They say it's 40 they take the expected the observed value which means this is also incorrect. And that's how you will answer the question if they would have given you a test statistic justice. So you have your expected values there which I didn't write. I didn't write them down. So once you have your expected values which are 41.25, 41.25 and 18.75, 18.75, 68.75 and 81.25. If they would have asked you to calculate the chi square test which we know that it is the expected value of your observed minus your expected square divide by your expected you would have said 40 minus 41.25 square the answer divide that by 41.25 plus you do the next one which is 20 minus 18.75 square divide by 18.75 plus 70 minus 68.75 square divide by 68.75 plus and then you do the last one which is 30 minus 31.25 square divide by 21.25. And once you have the answer here it would have given you the test statistic of 0.9139 0.1939 or is it 35? 39 and that would have been your test statistic which is the same as that value that we have here that value there and that's how you would answer the questions. Okay so we already covered some of this you just need to make sure especially those who are using the template make sure that you choose the right template. This is a 2 by 4 you're given your level of significance which means it's your alpha value and you need to know that your null hypothesis and alternative hypothesis now based on this information that is on here it says which one is incorrect on these two questions question number one and option number two they made an omission error here they should have put at least H0 or H1 so that you you are clear which one is which because they give you similar statements number one number two and they just say gender and favorite spots are independent gender and favorite spots are dependent. Whether are they looking for the reason or are they asking for a hypothesis testing and I think on this question it was just an omission of putting whether it's this hypothesis testing statement then you need to calculate your degrees of freedom remember it's r minus one times your c minus one if you calculate it manually or not even manually but you need to be able to calculate that and your critical value it's alpha and the degrees of freedom and you know you've calculated the degrees of freedom and your alpha is zero comma zero five so always know that it's zero comma zero five because you divide the percentage by a hundred and you go into also calculate the test statistic and you will get that and you make your decision and state which one is incorrect or correct based on that those statements the last one you can also answer this option one they are asking you to find the critical value so you just use your chi-square critical value of alpha and the degrees of freedom yes can you go to the next slide please i thank you i wanted to what you call screenshot okay thank you okay so the last question you just need to calculate your chi-square and the degrees of chi-square degrees of freedom and your alpha to find your critical value and number two is just stating the hypothesis testing looking at whether is it the null hypothesis and relating to the statement that they have given you number three is to calculate the chi-square test which is guys chi-square statistic which is the sum of your observed minus your expected squared divide by your expected therefore it means we need to go and calculate the expected value for each and every one of those values by using your row total times your column total for every observed value dividing that by n which is your grand total and once you have that then you can also find your degrees of freedom so your degrees of freedom is number of rows minus one and number of columns minus one and you would have calculated your degrees of freedom with the second one yes oh i thought there is a question so number four you would have calculated your degrees of freedom there you are able to answer both of them at the same time the last one you make a decision if your critical value if your critical value is what it is and your test statistics falls in the rejection area you reject the null hypothesis otherwise if it falls in the do not reject the null hypothesis you will state that you are not rejecting the null hypothesis and pay attention now really really pay attention because yeah they gave you a one percent level of significance yeah they gave you a five percent level of significance and you will have to come here to go and do your critical value alpha and the degrees of freedom for this one so yeah you will use your critical value as zero comma zero one and the degrees of freedom that would have found when you calculated the degrees of freedom there you must pay attention to that because it's not that five percent but at one percent level of significance okay and that concludes today's session are there any questions or comments or going back to the template so that we can find out any issues remember the template you just need to change the title so anything anything in my pen is writing inside okay the pen to write but you just need to change the titles and the column the row titles and also delete whatever it's in the white area that is everything that you need to be changing the others including the expected value and the calculations done and the test statistics they calculate automatically the only thing that you need to feed is just the data that is inside here all these values you just need to to add them including also all these columns those are the things that you need to input and this calculates automatically however you need to make sure that you're using the right you are using the right contingency table by looking at the title whether it is a three by three a two by two or a three by two or also at the bottom I have there a two by four if there are more than that but those are most generic type of contingency table that you can get so I just used old ones on the tape template other than that if there are no questions or comments please make sure that before you leave the session you complete the register and that is it from me to you are there any questions or comments um no questions from my side except um just to confirm that you said it is safe to use the tables during the exam yeah you can use the table because you're writing online right you're not in the exact venue okay thank you so much you can use the templates yeah thank you very much thank you and enjoy your weekend bye