 Your second last content session, it has been long. We are now in study unit 10. We are left with study unit 11 and then we have concluded all the sessions after next week. So if you are still behind or I feel sorry for you, we need to work on that so that you can catch up. To the rest of the team, but we will do revision using your assignment questions as well. We'll go through the assignment questions together as part of the revision from study unit 1 up to study unit 11. After you have submitted all your assignment and then we start preparing or waking towards you being able to write the exam. There I expect you to do much better than while we're doing assignments because you're still learning some of these things. So there is no harm, but you just need to make sure as well I need to raise this. You need to make sure that you don't miss any submission from now on. Assignment 4 and assignment 5, I know assignment 4 has passed already. I hope you all have submitted it. Assignment 5, you all need to submit this. There is no excuse. The assignment will open on the 16th of this month. There is no excuse of not at least even trying it once, because we would have covered everything you need in order for you to be able to submit your assignment. Then when we get to the question and answer session, you should have already practiced because every week we give you activities and exercises that I expect you to go through and practice those because this is math. You cannot study it. You cannot wait a day before you write your assignment and say, you will get it right and study it. And like I've been saying, if you miss one session, a session to listen to a recording for one hour, 30 minutes, it's not the same as listening to a recording of 10 minutes. One hour, 30 minutes, it means you need to concentrate. You need to be there physically and with no disturbances. Like we are here physically with no disturbances. A person going through this, you cannot skip, skip, skip and rush to the end or something like that. You will have to go through the same process that we went through. And it's painstaking because one hour, 30 minutes is a lot. And imagine if you miss two sessions. Therefore, it means you need to catch up for three hours and so on and so forth. So try by all means to attend all the sessions from now on until you go write the exam. I'm going to also send a message via my UNISA as well to remind you and to remind those who haven't started attending the sessions to start attending because there will be some areas where we're going to talk. That will be our own engagement that we cannot share because the recordings we share them to YouTube which is I'll do them publicly and everybody can benefit from there. But during those discussions after we do the content and everything, when we get to those sessions when now we need to talk hard to heart and engage and make sure I understand where your challenges are. Almost each and every one of you I need you to be able to engage with me at that level without any recorded session in one or two sessions that we're going to have so that we can unpack and understand where your challenges are and I should be able to address that. But therefore it means you need to make sure that you don't miss our Sunday sessions because then now we, like I said today is our second last. So after today, next week we talk about regression and correlation and then we're done with content. We just do activities and question and answer for you to be able to submit your assignment. And then after that, then it's about me and you getting you ready to write your exam. So without wasting more time, if there are any questions you want to ask me right now before we start with today's session, feel free to ask. The floor is open. Yes, hello. I saw a message on the statistics group about the G-square tables, but I've been unable to find. Okay, I will show you where to find it right now. I've posted it under my UNISA or under our tutorial, you know where you find my section, right? My E-tutor site, where my E-tutor site is at. Let me stop sharing this so that I make sure that I have the website here so that I can show you. Where you need to go. And then I can show you. Otherwise, if you joined the, what do you call this now? Teams, if you joined Teams group, I have created the group for us on Teams and I do share under the Teams group, and the files. I will also show you where to find that. But first, let me log in onto my UNISA so I can show you where to find the tables. And I think it's best to use these tables that you to go and download them on the website or somewhere else as well. It will benefit you to use the one that I shared with you because these are the ones that I've been used by your lecture as well. Okay, so I can go back and share my entire screen again. So when you go to your my UNISA, you sign in and you will see your name there and it will show you my modules and my admin. When you go to my modules, if it can go, when you go to my modules, you will have a list of all your modules there. In my one, it will look like your lectures one, which is STA 1610, 22Y, but mine will have a 1E if you are in my group. It will have 1E if you go there. And I want to log in here to switch to a student view so that you see what I see what you will see. So you will get to a page like this and when you scroll down, you will get to additional resources. When you click on additional resources, you will have the statistical table. When you open that folder, it will have the additional, the statistical table and you just download it by clicking on it and it will download. When you go to templates, I do have two templates that I shared. I will share more if I create more, but for now I have two templates, the chi-square, which is the one that we're going to be using today. You also need to go and download that. The notes for today after this session, you can go and find the notes on here as well. I think it is study units, you thought I uploaded them. Okay, I will check why they're not reflecting on here, but I did upload the notes for today. It's study units 16, maybe it didn't upload, I don't know. But yeah, the notes will be there. Alternatively, where I also shared the notes, let's see, see the reason why I don't like to share my entire screen is when I move from one, you see my entire screen. So that alternatively, if you join the group on Teams, you will see it will look like this. It looks exactly the same as my UNISA group under the files. You will find the notes. I don't know why it doesn't reflect on the other side, but it reflects here. Let's see, 16. I don't see it here, it doesn't reflect on here, but you will find the notes for today, they are there. I will also upload them there after the session, and this is the template we're going to be using. Okay, so without wasting any time. Are there any questions? Yes? How do you join the Teams group? If you are in my group, you would have received an email via your MyLife email address. It has all the details that tells you how to join the group. So if you don't check your MyLife email, also on my UNISA, when you go to announcements, all the announcements, everything I sent to you, there's a copy on my UNISA because I sent them through my UNISA, and there will be a weekly update, so you can look at the date. There is the fact I sent this yesterday. If you look at it, it will tell you what I communicated to you to say if you want to join the Teams channel, this is where you click, and you will be able to join the Teams channel. If you want to join the WhatsApp group, that's where you click. And I've also included there in that email, the very same email, the notes, the summary notes, as well as the template that we're going to be using as well as the link to join the session for today. So if you're not checking my UNISA or MyLife email, you won't get this notification, but they get sent into your MyLife email every time I post. Okay. Any other last question? Are we good? Are we happy? Yes, Lizzie, just clarity. Maybe I didn't get you. Regarding the last assignment, is it going to cover all the units we have done or it's only for study unit 10 and 11? Only study unit 10 and 11. So you're only going to write study unit 10 and 11. Okay, no, thanks. Okay, we have two minutes before I start. Any other last question? Anything you still unsure of? So you are all good. I'm assuming since there are no questions. Okay, so let's start with Kai Squirt. By the end of the session, you should be able to calculate the audio hypothesis for Kai Squirt test. You need the statistical tables because we're introducing a new table now this week. Remember the past tables we used was a Z which it's called the cumulative normal distribution table, cumulative standardized normal distribution table. We've used the T test, we've used the binomial, we've used the pi zone. So this week we are introducing the critical values of Kai or the critical values of Kai Square. You also need to remember the formulas that we're going to be introducing today. We're going to introduce the formula for how to calculate the test statistics for Kai Squirt and how to calculate the expected value. You also need a calculator. You can do statistics without a calculator. So I expect everyone to have brought their own calculators. Then you also need the template that I've created, but I will share the template right at the end after we have dealt with the content. By the end of the session, you should learn how and when to use the Kai Squirt test for contingency table. And this is the only section that you need to know. You don't do Kai Square for independent, sorry, for goodness of fit test, you don't do Kai Squirt for non-parametric test. You only do Kai Squirt test and also you don't do Kai Squirt for proportions. You only do Kai Squirt test for contingency tables or Kai Squirt test for independence. And why are we doing a Kai Squirt test? A Kai Squirt test helps us to determine or test a relationship between two categorical variables or two categorical variables. We know what categorical variables are or variables are because we have dealt with them in study unit one. Categorical variables are those variables that you can put into categories, are those that you can observe. Those are your categorical variables like gender, race, satisfaction levels and so forth. Those are your categories. So in order for us to test the relationship between the two categorical values, we always use a contingency table to summarize the data and a contingency table now. I'm not talking about something that you have never saw before. We spoke about a contingency table when we did the probabilities where we use the contingency table to calculate the probability of a joint probability or a simple probability, we've used that. So the same table, we use it to test the relationship. And it is useful in a situation where it involves multiple population proportions or multiple population categories as well or values or frequencies. And we use it to classify the sample observation according to the two or more characteristics that they have. And remember the characteristics are the variables that the characteristic is a variable. And that describes your population, right? So based on the different levels within those characteristics, for example, if you are a pet person and we do a questionnaire and we ask males and females to select how many cats or whether they are a cat person or a dog person. Whether a person is a dog person or is a cat person and we count how many they are, we can create a contingency table to look at how many males said they are a dog person and how many females said they are a dog person and how many males they said they are a cat person and how many females said they are a cat person. So that we can create a contingency table. And it is also called a cross-classification level or a cross-classification table that has the number of rows and the number of columns. So now if you look at the contingency table, you need to be able to classify whether this contingency table is a, how many rows do you have and how many columns do you have? And when you count the number of rows and the number of columns, do not count the total. The total is just a summary measure or a summary value that counts whatever it's inside the table as well. So I'm gonna do one example and I expect you to give me the type of cross-classification tables and number one and number two, also number two and number three. So this cross-classification table, we said it's number of rows times the number of columns or how many rows and how many columns. So this we represented because how many rows? There is one row, two rows. So there is row one and row two. The rows are the venues that are on the left. Those are the rows. So yeah, we have a two by how many columns I need to count column one and column two. So there are only two. So this is a two by two contingency table or a two by two way table. We can even call it a two by two way table. You can call it a two by two contingency table or a two by two way table because it tells you two by two table of two rows and two columns. What is the second one? How many rows do we have in the second one and how many columns do we have? In the second one, we have four rows and three columns. So it would be a four by three contingency table. Yes, it'll be a four by three table. And in the last one. The last one is three by two by two table. It will be a three by two table. So now you do understand what do we mean by a contingency table with N rows and M columns. So it will make it easier when we also calculate the degrees of freedom because for the degrees of freedom, we're going to use the number of rows and the number of columns. So you should be able to know and identify which ones are the rows, which ones are your columns so that you can substitute correctly and calculate correctly. In terms of a contingency table also in order to summarize it, I've already told you that we can summarize it in terms of the two variables, right? So this tells me that 42 males loves dogs or they are dog peasants. And this tells me that 39 females are cat people. I cat females, something like that. Oh, cat, they like cats or they are cat people, things like that. And your hundred is your grand total, which is your sample space, which we're going to be using that. So now you need to also pay attention. In the exam or in your assignment, they might give you a contingency table and do not give you the total. When you get a contingency table, you need to calculate the total. It's very important, especially when we do a hypothesis testing and you will see why we need the grand total and the total for the table. Moving on, since we're talking about the hypothesis, since we're talking about testing a relationship between two categorical variables, therefore it means you also need to know the steps relating to how you do a hypothesis testing. So in terms of a chi-square test, step number one, you need to know how to state your null hypothesis and your alternative hypothesis. And remember that we state a hypothesis, which is a guess, which is what the researcher wants to prove. It's always about the population parameters. But now with contingency tables or chi-square for independence, you do not have a population parameter and a proportion for these papers. You don't have a proportion or a population parameter, the mean. But you are given two categories. So we're going to state the hypothesis based on the relationship between the two categories and you will see when we state the hypothesis. But you need to know how to state the hypothesis. Step number two, you need to also create or define your region of rejection by finding your critical variables. So now the critical value here, we're going to find it on the table, which is called the critical values of chi. And because your chi-square is a one-sided test, which is a left, sorry, a causative skewed test or distribution, we're going to find only one region of rejection. Therefore it means we're going to only use the level of significance alpha when we go find the critical value, but we also need to find the degrees of freedom because on the table, it requires us to find the degrees of freedom. But you need to be able to know how to define that region of rejection. Step number three, you need to be able to calculate your test statistic, which is the test statistic value that you're going to use together with your region of rejection or your critical value to make a decision, which will be made in step number four, where you need to make a decision based on your test statistic and critical value and then make your conclusion based on the hypothesis testing. So how do we do that? Since a chi-square test for equality of two or more proportion or two or more characteristics or variables, but it extends the concept to tables with rows and columns. So now, when we state the hypothesis testing for a contingency table, we always say in your null hypothesis, you are always going to state that the two categorical variables are independent. The most important thing is independent. So if I have a gender and animal type as my two categories, therefore it means I will state my null hypothesis by saying gender and animal types are independent, always in your null hypothesis. Your alternative will state the opposite, which will state that the two categorical variables are dependent. And we all know what independent means. Independent means there is no relationship. So it means if our independence says there are no relationship, therefore the alternative will say they have a relationship. But when always, always, all the time, how we state the null hypothesis, we always use independent for null hypothesis. We will always use dependent for the alternative. You do not use the null relationship and there is a relationship when we do a chi-square test for independence. So that is how you will state the null hypothesis and the alternative bearing in mind. Those two things are very important. They relate to those statements. You are also expected to calculate the test statistic. So the test statistic for a chi-square test is the sum of your observed frequencies minus your expected frequencies squared divided by the expected frequencies. So we say it is the sum of your observed frequency minus the expected corresponding frequency squared divided by the expected corresponding frequency of that observed frequency. And you have to do for each and every observed frequency and square the values and divide by the expected and add them all together in order to find the chi-square test statistic. And we know that the chi-square test statistic is distributed even though it is a skewed distribution. So we always need to remember that that a chi-square test, it is also a skewed distribution. It is a skewed distribution test which has a critical value in order for us to go find the region of rejection. It has a critical value with a degrees of freedom and your alpha. So when you go and find the critical value in order for this, a chi-square test statistic has the critical value which is defined by your alpha and your degrees of freedom. And our degrees of freedom, we always go into define the degrees of freedom by counting the number of rows minus one times the number of columns minus one. These are the number, not the sum, but the number. How many rows do we have? So if we said we have two by two contingency table, there are two rows, two columns. So you will say two minus one times two minus one. If we had a three by two, you will have three minus one and two minus one contingency table. And that will be your degrees of freedom. Your alpha value will be the level of significance that they will be giving you. How do we then calculate the expected value? So calculate the expected value. And before we do that, remember this critical value is what we go and find on the table. How we define the expected value? We are going to use the row total multiplied by the column total of that observed value divide by the grand total, which is your overall sample size. And when we get to the exercise, you will see what I mean by how do we calculate this? Because you need to calculate your expected frequencies in order for you to calculate your chi-square test statistic. And then we go and make a decision and based on the decision, we can either reject or not reject the null hypothesis. And the rule says, if our test statistic is greater than the region of rejection, which is our critical value, we know that our chi-square test, it is a one-sided test, which is a positively skewed test. Therefore, if this is our region of rejection, which is our critical value and the degrees of freedom given by your alpha and the degrees of freedom where you find your critical value, it says if your test statistic falls in the red shaded area, we're going to reject the null hypothesis. And if it falls in the white shaded, in the white area, we do not reject the null hypothesis. That's what the state of using the statement itself, we can use a graph to represent it so that it makes it easier to visualize where the things fall. Right, and remember, the degrees of freedom is always given by the number of rows minus one times the number of columns minus one, which is different to how you calculate your expected value. Your expected value, you use your row total times the column total. We'll get to the example right now. So here we have a contingency table of the sample results organized in this format. So here we have theme agenda and hand preference. So maybe the questionnaire here, we asked people to tell us which hand they prefer to use, whether they prefer to use left or right hand. And there were males and female in this group. And the sample that we selected, there were 300. So of those 300, 120 of them, which is our row total for female, so this is our total for female, 120 of the female, 12 of them prefer left hand and 108 prefer right hand. Of 180 males, 24 of them prefer to use left hand and 156 of them prefers to use right hand. Of the 300 sample, the six prefer to use the left hand regardless of whether they are female or male. And 264 also prefer to use the right hand regardless of whether they are female or male. So this will be your total for gender. And for hand preference, 12 of them were female who prefers to use a left hand. It's the same thing, you can read it from hand, preference to female or from female to preference. It's still being one and the same. So only 36 of them prefer to use left hand and 264 of them prefer to use right hand. 120 of them were females, 180 of them were males. That's how you will read a contingency table. And if we want to test the relationship between gender and hand preference, that is where we use a hypothesis testing. So now we state our hypothesis. The hypothesis statement, our null hypothesis will state that gender and hand preference are independent. Gender and hand preference are independent. The alternative state, gender and hand preference are dependent, that is our hypothesis statement. The second step, we need to calculate, before we calculate the test statistic, we need to calculate the expected value. So to calculate the expected value for female observed values, they were 12. So we say 120 because we take the row total times the column total divided by n. The row total for 12 is 120 multiplied by 36 or multiplied by 36 divided by 300 gives us 14.4. For 108, it will be 120 multiplied by 264 divided by 300. If you take out your calculator, you will see that it will give us the same. 120 multiplied by 264 divided by 364, equals divide by 300 equals 105.6. Or if I use my calculator, my sharp calculator, it will still give me the same 105.6. And you can do also for the 24, it will be the row total, it's 180, multiplied by the column total is 36. Divide by 300, it will give us 21.6. The same for 156, it will be 180, multiplied by 264 divided by 300. Are there any questions? If there are no questions, then we move on. Go and calculate the test statistic. Now we have our expected values and our observed values. So our formula says it is the sum of your observed, yes? From the previous calculation, this one, does it have to give you 158.4? The answer. What do you mean does it have to be? Below one, the orange highlighted one. 180, multiplied by 264, equals divide by 300, equals 158.4. So it means you did something wrong when you were doing your calculations. Just double check. Okay, so now we have our expected values and our observed values. We can then substitute and calculate our test statistic. So we know that it says it's the sum, which means adding your observed minus your expected squared divided by your expected. So we say 12 minus 14.4, square the answer, divide by 14.4 because we divide by the corresponding expected value. Plus 108 minus 105.6 squared divide by 106.5. Plus, and then we do for all of them. We add all of the answers and they will get 0,7576. And that is our test statistic. Now we need to go and find our critical values if I go to the next slide. So let's go find our critical value. I want to do this one outside of the degrees of freedom. And you know that our degrees of freedom, I think from the other side, degrees of freedom is number of rows minus one times number of columns minus one. So reading this, how many rows do we have? We have one, two rows. So it will be two minus one. How many columns do we have? We left and right, there are two. So it will be two columns minus one. Two minus one. So two minus one is one times two minus one is one, which gives us one. So our degrees of freedom is one. So let's assume that this is at alpha of 0,05. So therefore, chi-square of 0,05 and the degrees of freedom of one, we need to go to the table. So let's go there. The table looks like this. It's called critical values of chi. You can also see from the critical values of chi that this is a one-sided test and the region of rejection is on your left, right? So we can use that. So now we need to find our, sorry, because now I have collected the marker. My bet, my bet. So we looking for a chi-square of 0,05 and the degrees of freedom of one. So the degrees of freedom, you're going to find it on your left. Similar to the previous one where we use the chi test. You don't have to worry about the cumulative probabilities. We're going to use the upper tail area alpha, the values closer to the table. We're looking for 0,05 and where they meet. So one, that is our one and our 0,05. Where they meet, there is our 3.841. That's our critical value. So our critical value is 3,841. 841, that is our critical value. So the test statistic, we know that we found that it was 0,7576 and our critical value which create our region of rejection is 3,841. We draw our graph for this decision rule. So we know that our decision rules raised that if the test statistic is greater than your critical value, your chi-square, critical value, we reject the null hypothesis. Otherwise, we do not reject. So we create our region of rejection, 3,841, it's here. And in the takeaways area, that's where we're going to reject if our test statistics falls in there. So our test statistic is 0,75 so it will fall somewhere in the do not reject area. And therefore we can make a decision and conclude in our decision, our test statistic of 0,75 is less than our critical value of 3,841. So we do not reject the null hypothesis and in our conclusion, we conclude that there is not sufficient evidence that the gender and hand preference are related at alpha of 0,05. And that's how you make a decision. They are independent of one another. Okay, this one just gives you a second. Are there any questions? Any queries, any comments? None for my side, thank you. If there are no questions or comments, we can continue. I'll stay. Okay, so let's get an example on how we can use this to answer the same question. So this contingency table of us of ours that we are given is not well populated because there are missing information on there, but you should be able to know how to complete a contingency table. So at the moment, they have given us the white male which is 40 black male and black female which is the two and 48 respectively. And Indian male which is 48, they also told us how many white answered this question or were part of this report. They were 70 and they told us how many people were part of this study. And we need to test whether there is a relationship between gender and race in this. So the first thing that we need to look at is what type of a contingency table is this? Anyone? Three by two or something? It is, you always start with the rules. So it's a three by two contingency table. Yes, it's a three by two contingency table. So how do I get the number of females from this contingency table? I think you are going to take 40 and subtract a 70 and then you get 30. And then it will be 30. So you say 70 minus 40 will give you 30 because 30 plus 40 gives you a total of 70. And how many black people were part of this report? 80. Black, where? 80. 80. So since we cannot calculate how many females were there but we can calculate how many males were there, how many Indian females we cannot calculate but we can also calculate how many Indians were part of the study. 120? There were 100, sorry, how many? There were 100 and 20. Seven plus eight is 15. So it's 150 plus 100. Yes, so that will be 100. Indians, there were 100 Indians. And then how many Indian females? It will be 100 minus 48, which will be 52. 52. And we can add how many females do we have? 30 plus 48 plus 52, that's 10. Everyone, that's eight, nine. That will be 108. How many males? 40 plus 32, it's 222, that will be 120. Oh, you could have said 250 minus 130, it would have given you also the same thing. So we know that we want to test the relationship. So this is our contingency table. Now it's complete. The first step, we state our null hypothesis and our alternative hypothesis. So let's do that. The null hypothesis will state that race and gender are independent. The alternative will state that race and gender are dependent, right? The next step is to calculate the expected value. So now let's calculate the expected value for 40. We take the row total for 40, which is 70. So you will say 70, multiply by 120, which is the column total for 40, divide by the grand total, which is 250. What do you get? 33.6. You get the 3.6 and go and do the next one. For 30, it's 70 times 130. So that will be 70, multiply by 130, divide by 250. 36.4. That is 36.4. For 32, it will be 80, multiply by 120, divide by 250. 38.4. And that will be 38.4. Let's calculate for 48. That's 80, multiply by 130, divide by 250. 41.6. And do 48. 48. And for 52. 52. That will be 52. Now, since we are done, the next step is for us to calculate the statistic. I've already calculated it. So that gives us, so you just substitute, 40 minus 33.6, divide squared, divide by 33.6, which is for 40. And you do the same for 30, it will be 30 minus 36.4. Square the answer, divide by 36.4. And then you move to the next 130 minus 38.4. Square the answer, divide by 38.4. 48 minus 41.6, square the answer, divide by 41.6. 48 minus 48, square the answer, divide by 48, which will be zero, that one, it's easy. 52 minus 52, divide, or squared, divide by 52 will be zero as well. So that will be easy to calculate. And when you add all of them, you will get 4.3956 as your test statistic. And from here, we need to go and find our original projection or create our original projection by our silent for a bit. I don't know where my tool is at, but it means I cannot write. Yeah, so how many number of rows? So we need to go and find the critical value by using our alpha of zero comma zero five fours. They will always give you your alpha value, not like I've done without giving you the alpha value, but they will give you the alpha value and tell you that the level of significance is zero comma zero five. So let's calculate the degrees of freedom. How many number of rows do we have? And how many number of columns do we have? So, till I find my paper, R minus one. Yeah, how good? So the degrees of freedom, it's the number of rows minus one times the number of columns minus one. So how many rows do we have? We said this is a three by two. So we have three rows minus one times two columns minus one, which gives us three minus one is two. Two minus one is one. So it's two times one, which is equals to two. So our original projection going to the table using zero comma zero, zero comma zero five, and our degrees of freedom of two that gives us 5.991 is our critical value. So our critical value, it's five comma nine one nine. So we can also go and make a decision based on this information that we have because if this is our five comma nine nine one, we found that our test statistics will fall in there, do not reject because everything that falls in this site we're going to reject the null hypothesis. So it falls in there, do not reject the null hypothesis. So since it falls in there, do not reject the null hypothesis, we can conclude based on the rule that we know because our test statistic, if it's greater than five comma nine nine one, we're going to reject. We've established that. And in conclusion and our decision, our test statistic or European paste, so what happened? Our test statistic of four comma 3963956 is less than, actually it's not less. Oh gosh, this is a mistake here as well. European paste from the previous activity of five comma nine nine one. And we need to change the sign and hence I didn't do that here. So since our test statistics of four comma 3956 is less than our critical value of five comma nine nine one. So we do not reject the null hypothesis and conclude that there is not sufficient evidence that the, and that also needs to go. And the race, that will be gender and race are related at 5% level of significance. I apologize for this. I will fix that on the slides and repost the slides. And that's how you will do a hypothesis testing. But now I want to show you on the templates that we have. So in order for you to use the template, you need to have your, you will need to have your contingency table already completed, especially with your observed. So you can have it with missing information on there. So you need to have it with your observed values already completed. So how the template works, it's easy. When I share the template, you will see the template is broken down into multiple contingency tables with some pre-populated values already on it because I used it to create the contingency table. So you, for a table that has, that is a three by three contingency table, you will select this template to use for a two by three. You can see that if it's two rows and three columns, you will use this template. If it's a two by two, you will use this template. If it's a three by two, you will use that template. If it is, I think at the bottom we have a two by four, then you will use this template. What I don't have, it's a four by two or a four by three or so on. If you get those kind, you should be able to look at what we did with the contingency table here, the calculations that we applied and you apply the same calculations there. But if you need help, feel free to ask. So let's look at the question that we have. So that I can explain based on the same information that we have here. So looking at our contingency table here, we have a three by two. So I need to go to a three by two contingency table, which is this template. So I need to replace all my other variables with the actual variables that we have here. So that then I work with the same and it should help me to answer any question that I am asked. So I'm just going to relabel some of this. The main, that's main first, main and fee main. And how you use this template, all the white area for the first way it says observed, you need to change. So I can also delete all these values in here because those are the things that I need to change. You will see that all the calculation, including the expected, including the test statistics, they all disappears. As the minute I remove the white area, don't remove the calculations because these are automated calculations. It calculates the sum of the values that you will place in here. So you need to capture your actual observed values, only your observed values, and you will see that as I enter, the table values at the bottom will also change. So that is 30, only the observed, right? Remember, don't change or don't add the total, only the white area for the eight and 52. And 52. Now, since we know that for expected value, it says the row total multiplied by the column total, if I go, I need to copy the same columns. I should have automated this to change automatically. I didn't, I just realized it. But you just need to also make sure that you copy all of them so that you understand which one is which on the second one, especially on the expected. So now, if you click on the expected value, you double click on it, it gives you the equation. You will see that the blue is your row total. As in the formula, your row total multiplied by the column total, divide by the pebble, which is your grand total, A. So if you want to get out, you just press escape. It will go out. And you can see that it's the same expected value that we calculated the same. For female, row total multiplied by the column total divided by the grand total, theoretically calculating it. Remember for the Indian, we said it was 100 multiplied by 120 divided by 250, and we found that it was 48. And that's what we have, 48. And 100 multiplied by 180 divided by 250, we did find it that it is 52. So this calculate your expected values. To calculate your test statistic, which is this whole formula, this complex formula that we have here, also it does calculate for each and every value. So remember the first part, which is this part where you take your observed minus your expected squared divided by your expected. For that, we use this formula here. So at the moment I'm using times, I'm multiplying it twice because squared means 40 minus 33.3 times 40 minus 33.3. That's what it says. Or you can use, instead of using it like multiplying by itself, you can use the power of two and it will still give you the same answer. For all of them, you can change the formula and it will still give you the same answer to the power of two. And that is to the power of two. And that is, the answer for this is 1.1290, the answer for 30 minus 36.4 squared divided by 36, it's 1.1257, the answer for the next one is that, the answer for the next one is that, and the answer for 48 and 52, they were both zero. And remember, we calculated it and we found that the test statistic, it's equals to 4.3956 and you can see there is your test statistic, which is equals to 4.396. So you can use the template to calculate the test statistic because it will calculate it quicker for you or you can calculate it manually. And then the rest, you will just go and do your decision. So that's how the template works. When we do some more activities, then we will look at how we can incorporate the template in answering some of the questions. Are there any questions? I have one question. So if we were given, say, a four by two table, theoretically we could use the two by four templates, yes. Provided we obviously put the values in the corrector and we would get the correct answer, right? Yes, yes, you can. Because then you will take your, sorry, let's go to the four by two, you will swap them around. So your rows becomes your columns, your columns becomes your rows, but you just need to make sure that for girls, let's say it's this one, for girls and bit, you put 18. It regardless of how, yeah, whether it's a two by four or a four by two, you just need to read your table correctly and place the values in the right. It will calculate like that. It will calculate it correctly. Hey, we can do that when we look at some more examples, maybe next week as well. That are the following week when we do activities. So here is another example that we need to do. So on this one, you don't have to have a table. You just need to know how to do this. Some of these things are theory that you need to know how to calculate them. So if you are given a contingency table with nine rows and two columns, how many degrees of freedoms will be there? So that is what they are asking you. So on this one, so they didn't give you a contingency table, but they are telling you that if you have a contingency table with nine rows and two columns, what will be the degrees of freedom? It's going to be 18C. It's going to be your number of rows minus one times the number of columns minus one. So that will be? Option C with eight degrees of freedom. Yes, because it will be your rows minus one times your columns minus one, which will be, oh, sorry, I'm already giving you the answer here. It's nine minus one times two minus one, which is eight times one, which is equals to eight, which will be C. So if they say, if they give you a contingency table with nine rows and four columns, you should be able to just use that formula to calculate because it will be your nine minus one times four minus one, which will be eight times three, right? That's how you will calculate your degrees of freedom. That will be your degrees of freedom. So sometimes they might ask you, if you have this contingency table and you are told what your level of significance is, then find what is the region of rejection or what we call critical value. Now consider a table with nine by two columns. We know that our degrees of freedom for this was eight. They told you in this question that your level of significance is 1%, which is zero comma, zero comma, zero one. So go find the alpha and the degrees of freedom, which is your chi squared of zero comma, zero one, and the degrees of freedom of eight. Let me open the table. For those who don't have it in front of them, it's the paint, I'm gonna take it up, sorry. I'm gonna make it bigger. So degrees of freedom is eight, alpha is zero comma, zero one. Where they both meet, I think we need to trade that is 20.090. So let's go there, option B. That's as easy and straightforward as it is. Let's look at more activities. Exercise one, the second media company published four magazines for Teenage Market. The executive director of the company would like to know whether the leadership preference for the four magazine is independent of gender. A survey among 200 teenagers was carried out. The following contingency table was obtained. So here is our contingency table. What type of a table is this? Two by four contingency table. It is a two by four contingency table. What is missing on this contingency table? Those who are calculating manually. Are they? Totals. Totals, so if you don't use the templates, you need to know and look at the table and say, oh, I'm not giving the total. Let me first calculate the totals. Then you need to go and calculate the totals for this table. Those who are using the template, when you have downloaded it, one of them it's already pre-populated with this question. And that is the example of a two by four that we just used now to explain. You can see that this is the same information that is on here, right? Girls, boys, be it youth, grow and live. Or life, life, not live, life, let's say life. 18, 12, 20, 28, 38, 26, 34 and 24 already pre-populated my observed value on this one. And I have my totals. Just wanted to bring the totals also on the same screen. Totals, if you add 18 plus 12 plus 20 plus 28, you will get 78. If you add the total for boys, you will get 122. Total for beat magazine, that will be 56. Total for youth magazine, that will be 38. Total for growth magazine, that will be 54. Total for live magazine, that will be 52. You need to find which value is incorrect already. The expected values on this one are calculated. So you should be able to just use the information. What we didn't copy is that I close. So we need to be able to have the correct titles there in terms of the road titles. So which one of the following statement is incorrect? The expected value of cell, youth and girls. We're looking for the expected value and the expected value. It's calculated by using the road total multiply by the column total divide by the grand total. So you're going to find your road total for girls and the column total for youth. Multiply them together and get the grand total. So since we have calculated them previously, so it's 78 times 88 divide by 200. 78 times 88 divide by 200. The other thing, if you calculated, you don't find the grand total as 200. It means we did something wrong because they told us how many they are in this table. So when you multiply 78 times 38 divide by 200, what do we get? 78 times 38 equals divide by 200 equals and the answer is 14.82, which is option number one is correct. And you can see from it's writing in state of, yeah, so that will be your answer. Number two, how do we state the null hypothesis? Gender and magazine are independent. Is that statement correct or incorrect? Do you still know or remember how we state the null hypothesis? That's correct. That is correct because we always state the null hypothesis by saying the two categorical variables are independent. And therefore it means if number one is correct or number two is correct, therefore it means the alternative will also be correct because the alternative will always state that they are dependent. Number four, what is the degrees of freedom? So you need to calculate your number of rows minus one times the number of columns minus one. How many rows do we have? We have two, minus one. How many columns? Columns, we have four. Four columns. Minus one, so our degrees of freedom is then degrees of freedom will be equals to three. Therefore, this is correct. Number five, chi-square is a symmetrical distribution. Does this look symmetrical? Does this look symmetrical? Remember symmetrical means belly shaped calf, like normally distributed, right? And we know that it is chi-square is not symmetrical, but it is skewed. We spoke about this. So that is the incorrect statement. The chi-square calf is a skewed calf. That's how you will answer the questions, whether in the exam or in your assignment, if you get them. The next question is the same, but yeah, they're asking you to calculate the chi-square test. So it means you need to be able to go and calculate the expected value for each and every one of them. If you are using the template, we already calculated for one manually, so you can also calculate for the others, but you will need to use the formula, chi-square stat is equals to the sum of your observed minus your expected squared divided by your expected. You will realize that I write all instead of that expected frequency, or sorry, observed frequency, or E for that expected frequency, which means I exactly the same, observed and expected. So you will need to go and calculate for each one of them the expected frequencies and then calculate. Otherwise, using the template, your expected frequency will just be equals to six comma eight, nine, one, two. So you'll see that with the template, you will find the answer as quickly as possible. You are more than welcome to use the template in the exam, except if you're going to write a venue-based exam, we can't bring your template. So you need to know how to calculate this manually, right? So don't get into the trap of only relying on there, relying on the templates, because if things change and they say now the venue-based exams are open, people can opt to go and sit and write exams in a venue-based, you need to know how to do these things, how to calculate them. You can go to the venue-based and say, I'm allowed to go on my laptop or on my this and look up the template and use it. You can't, you need to know how to calculate these things manually by using the formula, right? Okay, next, this one I'm gonna give you as homework because our time is almost done. We left with one minute, but this is another example of how you can calculate the chi-square hypothesis testing because all the options on here relates to how you make decisions, relates to all four steps of hypothesis testing. So you have a contingency table, you just need to be able to identify what type of a contingency table this is. You are not given the totals, you need to be able to calculate the total if you are going to be calculating things manually. They telling you in option, oh, sorry, the question is finding the incorrect statement. Question option one, they give you alpha of 0,05, they're asking you to find the critical value. So it means you need to go and find your critical value of alpha and the degrees of freedom. So it means you need to be able to find your degrees of freedom, which your degrees of freedom is your number of rows minus one times the number of columns minus one. You need to be able to know how to state your null hypothesis statement, whether is it independent or dependent in order for you to be able to select which one is the incorrect one. You need to be able to calculate your chi-square test statistic, means you first need to go and calculate your expected value. And to calculate the expected value, you remember it's a row total multiplied by column total divide by your sample, overall sample size. So you need to remember to calculate your test statistic by using your observed minus your expected squared divide by your expected and that should give you the answer for the test statistic. Then you need to know how to make a decision and conclude because if this is my region of rejection where my critical value of alpha and the degrees of freedom is defined, if it falls here, you're going to reject your null hypothesis, otherwise you do not. Are you going to reject or you not? Also pay attention, the level of significance is not the level of significance given at the beginning. So it means you also need to go and find your critical value again using 1% level of significance and make decision based on the test statistic that you got in option three, the answer there and you can find your level of significance. This question is answered in this. Contingency table, you do have that. So you should be able to answer all these questions except for the critical value option one and option five, you cannot answer based on this. You need to go find those values on the table to make a decision. I have more than that, so there is other questions so we can go through them when we do activities but there's nothing stopping you from looking at this and sharing amongst yourselves and discussing them on WhatsApp or on my UNICEF anywhere where you want to discuss them. I'm always here to help answer any questions especially if you are not sure about certain issues. That's why all these other platforms are made available for you to engage with. There is another activity. Yeah, they gave you the observed and the expected frequencies. You don't have to go and calculate them. They're asking you to calculate the test statistic. You just need to know the formula for the test statistic. Cheese, cheese, squid, steak, the sum of your observed minus your expected squared divided by the expected. You just take that value minus that value, square the answer divided by that value again plus 12 minus 12.18 squared divided by 12.18 plus and move on like that, like that. Or you can take the observed, put it in a two by two template, in the two by two template and you will have your test statistics there and you can answer the question quicker. Other questions, it's a three by three. You can use your three by three template, a three by three template to answer the question and you will be able to answer any of this. Pay attention to the rules that they have given you as well. Next question, it's a two by two table. Also pay attention to information given and know how to make a hypothesis statement as well and how you find the degrees of freedom, how you find the critical value, what is an observed frequency and what is an expected frequency. You need to be able to know the difference between an observed and an expected observed are those that they gave you are your observed frequencies. Your expected are those that you calculate by using your expected frequencies as the row total multiplied by the column total divided by the sample space. You need to be able to know those. And this is a two by four table. I'm not gonna repeat again and this is a two by two and I think it's almost like it's repeating but anyway, I don't know, a two by two table. And that concludes today's session. You should by now know how to do a chi-square for contingency table or chi-square test for independence. Are there any questions, comments, query, anything that you still unsure of? Practice, practice, practice. If there are no questions or comments, thank you for coming. I will see you next week as we do our final study unit which will be regression and correlation. Thank you. Thank you, Lizzie. Thank you very much. Thank you and bye. Bye, everyone.