 And welcome to your session four. Maybe at the end of the session we can talk about your exam dates and how we're going to proceed going forward. Because if you're writing on the 10th of September or earlier in that week of September, we need to get you prepared to write the exam. And we haven't covered as much because of the bi-weekly session. But we can check with Jack if can't we have a workshop on a Saturday, which will be a longer session where we do the exam prep. But yeah, I will have to check that. But anyway, so let's start with today's session, which we're going to look at chi-square test for using the contingency table. The last session for August will be on the 31st and we will be looking at basic probabilities. I think I can combine that session and do basic probability and normal probabilities at the same time. We'll see how far we can get on that day. Yeah, so do you have any question or comment or query before I start with today's session? In the absence of such comment or a question, then we can start with the session. So we're going to look at chi-square test for contingency table. Like with the previous sessions that we had, we were looking at hypothesis testing. So with chi-square test as well, we will be doing testing of two nominal variables to check if or to test whether the two variables are related. So a chi-square test is a test used to investigate whether the distribution of two categorical, usually nominal, or they can also be ordinal variables are related or do they differ from one another? So with chi-square, because we're working or looking at two categorical variables, it means we can create a summary table that has the rows and the columns. And that will create a table called a contingency table. And what we should learn or know about the chi-square test, when we make a decision, it's usually a positively skewed distribution because it's got the upper distribution in your upper tail area. So with chi-square, the origin of rejection will be in the upper area of the tail or in the positive side. And we call this a negatively skewed distribution. Even though it is a one-sided or one-directional table or distribution that when we make a decision, we're going to use the right side. A chi-square test is always a non-directional or a two-tailed test. It's a non-directional test. So like with hypothesis testing, we need to always know how to state the null hypothesis and the alternative hypothesis. Stating the null hypothesis, we're always going to check whether the two variables or test whether the two variables are independent, meaning the two variables are not related or they have no relationship. Your alternative is going to test whether the two variables are dependent, meaning they have a relationship or they are related. So in terms of a contingency table, you will have your variable one on the row, which will have the two categories, and a variable two at the column, which will have the two categories that relates to it. It can also be a three-by-two. So this one we call it a two-by-two contingency table because it's got two rows and two columns. So we can also get a two-by-three, which means it's got two rows and three columns. Or we can have a table which has a three-by-three, meaning there will be three categories within that variable and three categories within that variable at the column. So in terms of a contingency table, if they didn't calculate the total, you can add the values of the column variable and create a grand total or a total. Or you can add the values on your rows and create the total, and you can add your total to create the grand total. Because when we do the calculation for a chi-square test, we need to calculate what we call the expected value because what they will give you will be the observed values. Then you need to calculate your expected value because the formula to calculate chi-square test is the sum. If I write it in a signed format, it will be the sum of your observed value minus your expected value squared divided by your expected value. And that will be your chi-square statistic that you will use to calculate your test statistics that you will use to make a decision. And in order for you to calculate the expected value, you need to have the grand total because the expected value of a contingency table uses the rows and the row total and the column total. And I'm going to show you later on. When we make a decision, making a decision, we can use either the critical value or the p-value. So if we use the critical value to make a decision, then when the calculated test statistic, which is the test statistic that we will calculate, if it's greater than the critical value, which will be the value you find on the table. So there is a critical value table that you can use to get the value of your critical value. And if your test statistic is bigger than your critical value, you reject the null hypothesis. And in order to find the critical value, we use alpha and the degrees of freedom. And our degrees of freedom are your number of rows, not the row total, but the number of rows. You just need to count how many there are on a two by two contingency table. Your number of rows will be two minus one times the number of columns minus one. And that would be two minus one. And once you have the alpha and the degrees of freedom, you go to the critical value table, you will find your critical value. And you will look at your test statistic that you calculated and compare it to the critical value and then make a decision. If it's with the p-value statistically, they will calculate it using a statistical tool and it will generate the p-value. And you can say if the p-value is less than your alpha, you reject the null hypothesis. And those are the decision rules that you can use in the critical value or using the p-value. So you have two options that you can use. Okay, so with K-square test, you will be given a frequency table which will have your contingency table with your two categorical variables, which will have your observed values within it. If there are no totals, you calculate the totals. Then you need to calculate the expected frequency and the expected frequency is, so in order for you to calculate the expected frequency, we use the formula, your row total multiplied by column total divided by the grand total, which will be your n. And that will calculate the expected frequency and you will do it for every row and column. So the ij will represent the column number and the row number. In order to make a decision, we need to find the degrees of freedom and we know that the degrees of freedom is r minus 1, which is number of rows minus 1 times number of columns minus 1. Then you need to calculate the test statistic, which we know that our K-square stat, which will be the test statistic or the calculated test statistic, will be given by the sum of your observed minus your expected squared divided by your expected. So you will take for every individual value of your observed value, you're going to subtract the corresponding expected and you square the answer and then divide by the expected. And that will give you the test statistic. And once you have the test statistic, then you are ready to make a decision. If your test statistic exceeds the critical value, then we reject the null hypothesis. Or we can use the p-value if the p-value is less than alpha, we reject the null hypothesis. So let's look at an example. Usually on your questionnaire, when you do a survey, you find that you ask questions, you ask people to complete the questionnaire with answers that test the yes and no type of answers or responses. So let's look at this. So if we have a question in the questionnaire, which asks people, do you like a television problem? And they can either select like or dislike and probably in your demographics information, you would have asked them what is their agenda and they can select whether they are male or female. We can take these two questions and create a hypothesis testing. Then we can say test for the relationship that exists between gender and the response to the question, do you like a television program? So the first step that we need to do is to get the observed frequency, which will be your actual data that you will receive from the responses. So many number of people who answered the questionnaire, do you like television program? Fifty of them like the television program. Fifty-five of them dislike the television program. Thirty-six of those who like the television program selected that they are also male. And fourteen who like the television program were female. People who answered the questionnaire, there is sixty-six of them are male, regardless of whether they like the television or not. They are males and females, they were thirty-nine. And all the responses that you get at, they were hundred and five. So stating the null hypothesis, we can state that there is a relationship between gender and the choice of television program. The alternative will say there is a relationship. So in your null hypothesis, it will always state that there is no relationship and we want to disprove that. In your alternative, you're going to state that there is a relationship. It's the opposite of the other one. Now we need to find the expected frequencies. So since we have our totals, remember our expected frequency. We need to use the row totals and the column totals and the grand total. Remember that. Our expected frequency is the row total multiplied by the column total divided by the grand total, which is your L. And that is what we're going to do. In order for us to calculate for thirty-six, which I will say it's on row one and column one. Or I can say it's for like and male. I will use the row total, which the row is fifty. So that will be fifty and the column times sixty-six divided by the grand total, which is hundred and five. And that will give me, I brought my calculator here with me. But it's fine, since I'm not sharing my entire screen, I can quickly calculate it. So we say it is fifty multiplied by sixty-six equals divided by hundred and five. And the expected value will be thirty-one point four two. And that's how you will calculate your expected frequency. That will be thirty-one point four. Thirty-one point four two eight five seven, which we can run it up to two decimal, which will be thirty-one point four three. And that will give you the expected value for thirty-six. Calculating the expected value for fourteen. We still do row total times column total divided by the expected. So for fourteen, you will say it's forty, not forty-fifty. And so let me remove the paint. We'll say fifty times thirty-nine divided by hundred and five. For this like, you will do the same. It will be fifty-five times sixty-six divided by hundred and five. For twenty-five, you will say it is fifty-five times thirty-nine divided by hundred and five. And that's how you will find the expected value for the observed information. And once you have completed your expected value calculation, then we're ready to calculate the test statistic. We can also go find the degrees of freedom. Remember the degrees of freedom is number of rows. There are one, two rows and number of columns. There are one, two columns. So it will be two minus one times two minus one. It will be two minus one is one. Two minus one is one. One times one is one. Our degrees of freedom is one. In order for us to calculate the test statistic, so remember we had our observed value and all the observed value had the corresponding expected values. We need to calculate the chi-square state, which states the sum of your observed minus your expected squared divided by the expected. So I can create a table like this where I write my observed. And it's corresponding expected frequency because our thirty-six, if we go back, remember thirty-one point four three corresponds to the actual of thirty-six. And eighteen corresponds to fourteen and thirty corresponds with thirty-four and so forth. So now I can just write it like this where I have my thirty-six and thirty-one point three. And because the equation says observed minus expected, I can calculate that. So thirty-six minus thirty-one gives me four point five seven. Fourteen minus eighteen gives me minus point five four point five eight. Thirty minus thirty-four gives me minus fifty-eight point four point five eight. Twenty-five minus twenty gives me four point five seven. So I'm done with what is inside the bracket. I need to do what is the square. I need to square all these answers. So four point five seven times four point five seven gives me zero point six seven. And you continue negative four point five eight times negative four point five eight gives you one point one three. Negative four point five three times negative four point five eight gives you zero point six one. Four point five seven times four point five seven will give you one point zero three. So the square is the same as multiplying that number again by itself. So we're done with the top. Oh, sorry. We're not done with the top because all of them you needed to do sorry. My mistake here. This is four point five seven times four point five seven divide by four point five seven. That is the answer that we looking for, which is that will give you zero point six seven. And you can do the same. Minus four point five eight times minus four point five eight divide by minus four point five eight will give you one point one three. And that will be that is everything without the summation. So the summation means adding. So we need to add that value plus that value plus that value that is that summation. And sorry. And when you add all of the values, then you get the sum of your observed minus your expected squared divided by your expected will give you three point four four. Because if you add zero point six seven plus one point one three plus zero point six one plus one point zero three, it should give you three point four four. And that is your kind square test statistic. Going to the table, the kind square test statistic. Remember, we did find our degrees of freedom to be equals to one. So if they gave us our alpha of zero comma zero five or eight alpha of five percent level of significance, then we can go on the degrees of freedom find your one. And on your at the top find your level of significance or your alpha value and where they meet that will be your critical value and the critical value here is three point eight four one. So we need to use the kind square test statistic that we got, which was three point four four seven. If I draw a kind square distribution, and I say at this point, my region of rejection based on my critical value of three comma eight four one. That is my region of rejection taking my critical or my test statistic kind square test statistic, which is three point four. I can locate where it is. Remember anything that falls on the shaded area. We reject the null hypothesis anything that falls in the white area. We do not reject the null hypothesis three point four four falls in the white area. So we do not reject the null hypothesis. So since the kind square test is less than your critical value, then we do not reject the null hypothesis. So if they would have given us the P value, we will use the P value and alpha level of significance or alpha level of significance to make a decision. And then in conclusion, because we are not rejecting the null hypothesis, we can say there is no significant relationship between the product of choice of television and gender. And that's how you do kind square test. Any question before we start looking at typical questions that you get in your exam? Any question? Am I alone? Please remember to complete the register. Okay. So without any questions, we can then continue with today Andrea. So you haven't been seeing anything from what I have presented or not? Okay. There are no questions. I'm not sure. Let me just check because I see if there are some messages in the chat. I just want to double check that I'm not missing. Guys, in order for us to do the exercise, are you going to be typing? So you're not going to be talking to me the whole time. So let me also put on my phone so that I can read your chats while I am busy on a presentation mode. All right. So the first question. A researcher wants to establish whether the type of employment category that is filled by employees of a particular company is significantly related to their agenda. The employees can be categorized as manager, human resource, administrative, maintenance, and information technology worker. And the agendas are male and female. Which of the following, which will be the most appropriate test to use? Is it number one to test for independent, which is the key test for two independent samples? Number two, is it the Pearson correlation test statistic? Number three, is it a car square test statistic? Which option? I'd say number three because it's categories. Yes, because you've got two categories, which one is a category about their role, the type of employees, and another category is your agenda. So that will be option number three. Which of the following is appropriate formula for car square test? Is it number one, number two, or number three? Remember, you can also post your answer on the chat if you don't want to talk to me. In that way, I can also see if what I have shared with you, you do understand. Number one. Yes, that is number one. That's correct. Because number two is to test for the difference between the sample means. And that is for one sample, for one sample or one population. And number three, is the test of, is your correlation R? Is your correlation coefficient formula? Exercise three, a contingency table represents, is it one, the distribution of the frequencies for a variable? Two, a frequency count for each of a number of possible outcome of an experiment? Number three, the frequency counts if each observed, or if each outcome measured on two nominal scale variables when they are cross classified. One, two, or three. Number one. Number one says a contingency table is a distribution of frequency for a variable. So if it's one variable, when I only have gender, because gender is one of a variable, doesn't say frequencies, sorry, doesn't say variables, which means they should be at least two. So number one is not correct. Now you're left with two or three. Frequency count of each number. Okay. Number three. Number three, and that is correct because it says it's a frequency count of two nominal variables when they are cross classified. That is a contingency table. Number two, it says it's a frequency count of each number of possible outcome, and that will just give you a discrete table, which will only have one variable. Exercise four, which of the following sets are appropriate for determining whether a relationship exists between two variables if both are measured on a nominal scale of measurement? Is it one, a T test for two independent samples? Two, is it a testing? Is testing the significance of the Pearson correlation coefficient? Or three, the Chi square test? Number two. I'll go for three. I'm thinking the word relationship. Number one. I hear number one, number two, number three. So let's start with number one. So number one says the T test for independent. Number one will be testing for the difference between two independent sample or two groups. Yeah. And number two, we will be using numerical, numerical information which will test the scores of the pre-test and the scores of either the male or females or we can test the scores or the test of your assignment and the test of your aptitude test or something like that. That will be used to test the T test. Number two, where it says testing the significance of Pearson correlation, you also need two numerical variables, which one will be your independent variable and another one will be your dependent variable. Remember that. That is the previous ones, the previous sessions that we had. If we read the statement before we even come to the Chi square test, Chi square test the relationship if it exists. Also number two tests the relationship. So the difference between number one, sorry, number two and number three is number two tests the relationship of numerical variables. Number three tests the relationship of category variable and number one test if there is a difference between the variable. So this one tests the relationship. So reading the question you need to identify whether does it test the relationship between two variables. And if it's the test between two variables, are those two variables numerical or categorical and they gives you nominal scale and what it will be a nominal scale. It's part of your categorical variable. What are they for numerical scale, numerical scale. I'm sorry about my handwriting. I hope by now you are used to it. So for numerical variable the scales are ratio. Remember ratio and interval. That is the scale for numerical scale for categorical. It's either. I'm so sorry. I must go back to grade one to go learn how to write. No. No me now. Okay. So the answer here is option number three, because we're dealing with nominal scale of measurement. The car square is used to compare which aspects of the data for the two samples. Number one, the distribution of the data is classified in terms of a variable. Two, the sample means of a variable for each sample. Three, the variance of the variable as measured for each sample. One, two or three. Remember you can post your answer if you are not sure or if you don't want to say it out loud. So is it one, two or three. You're saying three. Three is incorrect. Because here it talks about the variance. If we want to test for the variance, we use the F test and that will test your variance one divided by your variance two. And that is when you're testing or you want to compare your variance. So number three is incorrect. Now you are left with one or two. Is it one or is it two? I think it's one. One is the correct answer because number two, you can use your T test. Test this for the sample means. So either is it for group or is it for independent groups or dependent groups. So number one is the correct one. As long as it talks about chi-square test, always remember it needs to be data that is in categorical format. And there needs to be two of them. So this one says the data is classified. The only data that can be put into categories or can be classified is categorical data. Numerical data. We can classify it, but we don't say classified because numerical data are measured. So only number one is the correct answer. Exercise six. A number of psychiatric patients are classified by gender, male or female, and into one of four categories as schizophrenic, severely depressed, bipolar disorder and others. Which of the following is suitable for representing counts or frequencies of persons or persons which falls into possible subcategories? Can we use a contingency table, a scatter plot, a histogram or a spreadsheet? One, two, three or four. Isn't it a contingency table? Yes, it is a contingency table because a contingency table, we can use it to visualize two categorical variables which are classified as gender. So on the rows we can put gender, on the column we can put the other four categories of mental health issues or psychiatric illnesses or a scatter plot visualizes numerical values, numerical values and you need two of them, X and Y. A histogram is a visualization which also does the numerical values which you take one or you need to have at least one variable, one numerical variable and you put it into boundaries or class width and create a histogram and that is your histogram. A scatter plot, remember it is this graph. A contingency table is your category one and category two. A spreadsheet, if you think of a spreadsheet you can just think about Excel. I'm not even going to explain what that is. So can you say a contingency table should always remember that it has rows and columns? Yes, so for example, they don't talk about rows and columns. The key things you need to always remember is categories. Categories are classifications. So you classify and you are given two variables. So variable one which is a categorical variable and variable two which is those four categories which are your, maybe let's not call this contingency table of ours categories. Let's call it, let's say this is gender and, oh sorry, I should have used gender on the site. My bad. So on the rows is your gender. So this will be gender on the rows which will have male or female. And then at the top you will have the four categories which is schizophrenic, severely depressed and you have bipolar and others. So you see that is your frequency table which is your contingency table. Then they also mention things like counts. Remember here you can put here counts which is one or two or four and six and eight and thirteen and fourteen. Those are counts. Counts and frequency are one and the same thing. So frequencies count these frequencies. And that is that. So in your mind you should also have this visualization to say oh a scatterplot you will have your X and your Y variable. When the values of X increases the values of Y increases for a positive relationship. A histogram is if you have one variable. And let's say exam marks and then they can say those who receive 80 percent to 50 percent. Those who receive 50 percent to 60 percent. Those who receive 60 percent to 70 percent. Those who receive 70 between 70 and eight and so forth. And how you differentiate between the visualizations. Okay. Pardon? I said someone. Okay. Is there a question? Is there a question? Hello, can you hear me? Yes. I just want to know does the congested table doesn't it hold data? Yes it does. Remember these are your questions. So how many people have selected in your questionnaire in this? Let's say even it's not a questionnaire. Let's say this is your which you're doing IOP. Is this psychology research? So let's say you work in a research center. You mean research center somewhere. And have a record of your clients or your patients that comes for consultation. You will have on their file what their agenda is. And then you will also have on their file what type of illnesses or issues that they came for consult on. So at the end of the month you take your information and then you count of those who are females. How many of them have schizophrenia? And then you count them and then you get that they are only two of them. And then you do for males. How many of them have schizophrenia? You get that it's one. How many of them are severely depressed? You find that there are 10 females and three males. How many of them have bipolar? Maybe more males have bipolar and two females have bipolar. And the other maybe 20 and 10 other illnesses. That is the data that you have. You just classify it and summarize it. So at the end you will know how many patients you are helping in a month. And you can have the grand total of them because this will be those who have schizophrenia. We got less of whether they are male or female, which will be your total. And those who have severe depression, they will be dating and those will be 17 with bipolar. And 30 for those who have other illnesses. And you can also find the total of male or female. You got less of what illnesses they have. So that will be 84. And this will be 29. And the grand total will be how many patients you have in a month. Three. So that is the contingency table. Okay, I just need clarification, man. Let's say for example you had work. So can you use the contingency table instead of the spreadsheet? Yes, because a contingency table you can create it using your spreadsheet as well. So let's, now we're going into another, but it's fine because we have another one hour. No, don't worry, don't apologize for it. You asked and I shall answer the question. I just want to go to a spreadsheet. Okay, for some reason my patient doesn't want to. It's fine, it's okay because I just needed confirmation at all. No, it's fine. I will show you just now. I just need to, to end this lecture. Because it doesn't also want to move. Let's say you have a, I'm not sure which way to go. I went to the hospital at the moment. So let's assume this is a program. I have another people there. I just want to be able to type every time. Now, this patient is 1111. I'm just going to do 111 because this is the number of time they came in, regardless. So you have your file. This is your file, your records per day or whatever the day is. And this is illness and this is the agenda. And this is the count because usually on an Excel spreadsheet you can count them. So you just take this, your contingency table is a pivot table. So you just go to your inset and then you go inside a pivot table. And I also want to make sure that this table is on the same sheet. I just go to existing and go inside the location and click on where I want this contingency table to be at. And it's going to create the pivot so I can take my illness. Or maybe I should have used the same. So there is my illness and there is your agenda. So at work you will have many, many other columns. You will have the arrays. You will have the doctor that is helping something like that. So there will be a lot of information. Maybe I shouldn't have put the table there because then my various here are squashed. So let's make it bigger. So on a contingency table, I just want to make this bigger. Okay, so on a contingency table. Either I can use the count because I put the 1111. But I can use any other value that I want and put it on the measure value. And there is your contingency table and it will have how many of them have A, how many have D, how many have C. So if I change one of these values, let's say, let's make this one A and reduce the values A, A. I can just refresh this table and you will see so that my laptop is very slow today. I can just refresh and now it looks much better. Let's remove G and make it F. Let's remove F and make FG. So there is your contingency table. And that is your contingency table there. And they just summarize the information. Okay, thank you so much. All right. I'm a question. Thank you very much for the demo. At least I'll be able to use my. Oh, yeah. You can also do that. They are part of the class today. Please mute if you're not saying anything. Thank you. Number seven, a researcher studying possible sex linked inheritance of three psychiatric disorder denoted by A, B, and C, tabulated by gender, which is male or female, of 100 psychiatric patients against their diagnosis. And that is your contingency table. What they didn't do here is calculate the total, but it's not a problem. So this is your contingency table, which has the three types of diagnosis, diagnosis, diagnosis, diagnosis, and male or gender. The question is, which research design did the researcher use? Number one, did the researcher use a correlational design? Number two, did they use a two sample group design? Number three, did they use a three sample group design? Is it one, two, or three? Think about what we are discussing today actually as well. I think it's two sample group design. Then it is one. Okay, so you need to ask yourself. These three samples. You need to ask yourself this question. Remember, yeah, they're talking about sample designs. So it means there is one group and then there is another group. If it's two sample, if it's three sample design, then there is one group, one group, one group. Sample one, sample two, sample three, sample one, and sample two. So it cannot be a sample. Because here we have only one sample of hundreds of hundred patients. We don't have two samples. We only have one. One sample, one sample with hundred psychiatric patients. We don't have n is equals to one of a hundred, n is equals to one of a hundred, n is equals to one of a hundred, or three, sorry. This will be n two, or this one n is equals to a hundred, n is equals to a hundred. We don't have that. We only have one sample, one. So that won't be correct. So the correct answer is one. Yes. Also, when you see a contingency table, you also need to remember that this looking at if not you doing the probabilities. Yeah, then it means you are doing a correlational study or is remember what correlation is correlation is a study of relationships. If you want to check whether that relationship is either weak relationship or a strong relationship, are they related or are they not related? That's correlation. Remember, you can do a correlational design for numerical value, which will use the piece in correlation for categorical data will use the chi-square test. So if I understand this correctly, it doesn't mean that we can say that we are trying to see if there's a relationship between the six linked inheritance and the psychiatric disorder. Nope. You want to see. Remember the sex linked inheritance are your three psychiatric disorders. You want to see if there is a relationship between gender and those psychiatric disorders. That is this contingency table, the ABC and the male and female. That's the relationship you want to see. Oh, so that means we want to see if maybe males are more prone to these psychiatric disorders. Yes, which is sex linked inheritance psychiatric disorder. Okay, now I see. Thank you. All right. I like it when you see. Oh, you understand. Okay, so number one is the only correct answer. Exercise eight. A researcher. Oh, I think. Oh, no, they're different. A researcher studying a possible sex linked inheritance of three psychiatric disorder. Tabulated the gender of 100 psychiatric against their diagnosis. And that is our contingency table. The question is, what are the requirements with regards to the statistical test to be performed? One, is it a directional statistical test required? Two, is it a non directional statistical test required? Three, is a non statistical test required? Those who joined the session late. I'm going to depend on those who joined the session earlier who started with the session because we did cover this. So which option is the correct answer? Before you answer that, because I can see that the majority of you didn't start with us when we start. Let me not be unfair. Let's go to the first. Let's go to the first, first slide that we started with. When we were explaining what category Chi square is. So you can read, but the one that is most important is the last bullet. I don't know what I'm doing here. It's non directional. You satisfied with what you read? Okay, we're going back to the question. The question was asking, if we're looking for a test of a relationship between two categorical variables, what kind of test statistic can we perform? Is it the directional test statistic, no test statistic or a non directional test statistic? Number two, a non directional test statistic. Number two, even though when we make a decision, like I said, we're using a one tail area, but a Chi square test is a non directional test statistic. Representing the gender of members of parliament, I think in your module they like gender. Representing the gender of members of parliament versus their political party to which they belong is best done in a form of a scatterplot, a contingency table or a two sample group design. It will be number two, definitely. Because a scatterplot, I'm not going to go there because we covered this scatterplot. It's for numerical variable, independent versus your dependent. Two samples group, it means it would have been two different samples selected because now we're talking about two categorical variables. Political party affiliation and gender. A researcher wants to establish whether a relationship exists between people's religious affiliation and whether they are in favor or against the penalty, yes or no. Which of the following would be the most appropriate test to use? Will it be a T test for two independent samples? The Chi square test, the Pearson correlation test, the T test for independent. Remember you need to be able to identify the things from the paragraph that you just read because in the exam, especially for today because we're dealing with one specific area but in the exam you will have multiple things to deal with. So you need to be able to go to the question or the statement and identify what are you given in the statement before you even answer the question. So is it one, two, or three, or four? You said number? Number two. Yes, it goes with the relationship that exists between religious affiliation and whether they favor death penalty. Two categorical variables. Number one it deals with two numerical variables from the same group, population group where you select two samples from those population group. Number three it deals with correlation test which is numerical variables so you are not given numerical variable. You're not given scores or numbers there or means or things like that. The T test for two independent sample which is the same as number one. I'm not going to go there again. So you just need to make sure that you understand the information given relationship. If they would have said whether there is a difference between the numbers then you know that either you're going to be doing the means or the T tests and so forth. Sally wonders whether a relationship exists between the person, length, and their leadership ability now. Here you need to be very careful because remember I spoke about numerical variables and so forth. Do not jump because you saw length and assume that there is a numerical variable. It can be length in a categorical manner described as whether you are tall or short or medium. That is also length and their leadership ability. She collects data from a sample of 95 people classifying them as short or tall and as leaders follow us and those she could not classify. And she creates a contingency table there so we can clearly see here is your person length which will be tall or short and the cross classification of leadership ability. So this will be your leadership ability. And it's categorical data because it says classifying them. If the frequency, sorry, if frequency data is evenly distributed through the categories with no proportional differences between tall and short people. As far as leadership ability goes, what would be or what would you expect the number of people who can be classified as short to be. So it means you need to go calculate the expected value of short leaders. Sorry, I forgot to read the whole thing short leaders. So it means short and leader, which is 32. Remember the expected frequency. So for short leader, we'll need row total times column total divide by N. So how do I know which one is my row total my column total since they don't have total. So you just need to quickly go and calculate your total. You can calculate the total for the whole table but I wouldn't in the exam. I'll just concentrate on the question that they asked. So since I'm looking at 32, so I'll just calculate the total for that and the total for that. But I need also the total so it doesn't really make sense. So you just need to calculate the total. So quickly calculate the total and let's complete the whole table. I'll just calculate for all the values on the table. So 12 plus 22 plus 9. This will be 13. Okay everyone. That will be 43. And this will be 10 12. And this carry one 52. The four, the grand total will be 95. Oh, they are 95. I should have thought about it. So I just need to calculate total for for leader as well. It's four. Three plus one is four. I don't have to complete the whole table because I've got the values I need. I just need 44 and 52. So coming in here. Road total 44 multiplied by my column total, which is 52 divided by 95. Have you calculated? Can just go use up my calculator. So we have 44. Let's use the fraction. We have 44 multiplied by 52. Divide by 95. 24.0842. If we round it off to two decimals. The answer is option. Option. How did you get the 44? And the answer, the answers question is answers three. 12 plus 32. Since we need to calculate the road total, which is 12 plus 32 gives you 44. 22 plus 14 will give us 30. 36. And if I add that will give me 95. Happy happiness. Same question from the previous. The question now says to determine whether the relationship exists between the length and leadership abilities. Sally has to calculate the appropriate test statistic. Oh, now you need to go and calculate the expected values. All of them. Let's use our previous values. I am so lazy. So I'm going to just copy and I'm going to replace. Oh, I cannot replace. I need to. 44. 36. 15. 95. What did we get here? 12. 52. And here we had 13. 43. So we need to calculate. So we did calculate the expected value for this. We found that it was 20. 24. 24. 08. So now let's calculate for all of them. So for 12, we say 43 times 44. Divide by 95. If I can find it. I don't have a calculator other than the one online. So you have to assist with some of the values. I can calculate some, but it will take me forever because I will have to toggle between the two. So. Let's see 43. Times. 44. Equals divide. By writing down for 12. Is 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. Now if I leave this screen. I leave I lose everything 19. next to 22 there it's 16.29 and then I just want to multiply by 43 times 15 at least for 9 will be 6.79 now let's do for 6 since I still have all the values here I just need to change my 43 to 52 what happens now 52 times 15 divide by 95 so everything will divide by 95 and that is for 6 it's 8.21 and let's do for 14 for 14 all I just need to do is change 30 36 52 times 36 19.71 okay so I can quickly write the values that I just calculated now 19.92 and this one is 16.29 and 6.79 all the red ones are my expected value for this one it's 19.71 and 8.21 so now it says which of the following categories will fall which one of the following categories will the results fall or will the results fall so all what we need to do because they say if we need to calculate the appropriate test statistic so to calculate the appropriate test statistic remember our chi square test will be our observed minus your expected square divide by your expected so it means I'm going to take my observed which are my black ones say 12 minus 19.92 squared divide by 19.92 plus I can just do the 22 22 minus 16.29 squared divide by 16.29 plus 19.29 minus 6.79 squared divide by 6.79 plus into 32 minus 24.08 squared divide by 24.08 plus 14 minus 19.71 squared divide by 19.71 plus 6 minus 8.21 squared I'm not sure if you are able to see that 8.21 okay so I just need to work it out the worst part is my calculator cannot work out all of that at the same time no there won't be time in the exam to do this because they won't give you questions like this but I think for your assignments you are expected to know how to calculate because I think this one comes from your tutorial meta 101 okay so let's do the calculations so let me get my other pen my calculator so so it means I can do three at a time so I'll do the first three and I will add the second three at the same time so the first thing I need to do is the fraction and do open bracket what did I do now fraction open bracket 12 minus 19.92 squared divide by 19.92 I can only do three at a time so I'll just continue open bracket 22 minus 16.29 divide by 16.29 and I just use my arrows plus 9 fraction open bracket 9 minus 6.79 bracket square down 6.79 let's take a chance and see if I can add I can add another one plus open bracket 32 minus 0.08 I will also show you on the excel sheet close bracket I cannot do that so I must delete everything because it will not allow me to go beyond beyond that so I can only do three at the same time so equal I can write the answer for the three which is 5.8 I'm going to write the whole number 586 970 6 970 2903 I don't want to drop any decimals and then I must do the next the next time consuming to do all the calculations 32 minus 24.08 close bracket squared go down 24.08 go to the side plus open bracket 14 minus 19.71 close bracket squared go down 19.71 that's the last one plus 6 open bracket 6 minus 8.21 close bracket squared go down 6 8.21 8.21 equals 4.8 number then I must add the other one that I got which was plus 5.5.868 5.8 then I am typing now 5.869 702903 equals 10 point that's our expected value 10 point let me write it down before I forget 10.72 let's go here and say the answer here is 10.72 so I just hey now I'm going to show you my my screen my my entire screen everybody now at UNICEF I will see my my screen my screen screen and what I have on my computer secrets and all okay I just wanted to show you I've got a template as well which has different calculations on excel I can email it to send it to you on I'll also post it on where the notes are so our table is a we need to go to our presentation our table is a three by two so I need to use a three by two template so I must you must look at this it says two three by three two by three and there is one at the bottom on the site as well the site it says what do we have a two by two and a three by two so so since it's a three by two all I need to do is put in the values so this is tone and that is short and I'm not going to put the whole names there it was leader follower and unclassified so I'm just going to use l l follower and unclassified I'm lazy to type the whole sentence I can just copy them paste them there also the same with the columns because I just need to make sure that I have everything the same way so on our table I'm just going to replace all the black ones on here that's 12 that's 22 and that's nine and you will see that calculations are happening right there 24 not 24 it's 30 32 32 and six and you've got the same so if you look at our our values 19.2 16.29 this is our expected and here it calculated each one of them so the formula that I use here is just to take 12 minus 19.92 multiply it again by itself divide by the 19.82 because it's my divide by the expected way I could also have used instead of saying that I could have just used the power because only it's a power like that it will give you the same answer anyway so our answer is 10.71 is the same as the one way we use the manual calculation and it took us forever three years to get it done so which one of the following statement is incorrect yeah so which one of the following categories will the results fall is it below zero is it between zero and two is it between two and four the results is our 10.72 is above four and that's how you will answer that question I'm not sure how many more exercises we have let's just look at them we have 20 minutes so not so many which within this 20 minutes we will be done so the next exercise the chi-square test statistic is used to compare one the frequency distribution of the observed with the frequency distribution of the data that is expected if the null is true two the variance I'm not going to read the whole sentence three the covariance of x and y it's one it is one because remember I said no variance no covariance covariance is for coefficient of coefficient of correlation a researcher wants to establish whether the type of employment category that is filled by employees of a particular company and those are the categories of employment uh is at all influenced by their gender male or female which will be the most appropriate test to use is it one two or three great it's three not even going to bother explaining why the other two are not the right ones a number of psychiatric patients are classified into four categories and I think this we looked at something like this so and it's one it's one number 15 and I think also this one we did look at something similar but this is different a contingency table indicates one distribution of frequency for a variable two across cross classification of two nominal variables three the plot of the relationship between two variables one two or three number two number two because this one says a plot contingency table is not a plot it's a table um and number three says a distribution of frequencies of variable we're working with two variables a contingency table is used to summarize the relationship between two variables measured on a or n scale one on a nominal scale we can also use the ordinal but it's very rare that you use a contingency table because with nominal scales usually it's those um rate our what do you call those rate our services which says agreed is agreed we usually they will not give you like a proper test so we usually like to use the nominal variables for a contingency table which of the following is appropriate formula for chi-square test is it number one number two number three number four four it is number four even if it's not visible enough it's number four this one test the difference between two samples this one test the different test one sample and this one is four coefficient of correlation so the only current answer is number four which gives you the sum of your observed minus your expected square divided by your expected exercise 18 what is the expected frequency observation in cell a y a and y what is the expected frequency of ay remember maybe i shouldn't have written it closer to the table what you need to do is to calculate row totals and you need to calculate the column totals and then have the grand total because they didn't give us what the grand total is here so you will need to calculate the expected value of ay by using the row total times the column total divided by the grand total so let's calculate row total for a is six plus four is ten six plus four is ten ten six every way it's ten ten ten ten ten so this will be 20 so how do we calculate ay expected frequency ten times ten divided by 20 ten times ten divided by 20 which is 100 over ten over two it's five which will be equals to five which is option number two so when they ask you about expected frequencies and they gave you a contingency table when there are no totals you quickly calculate the total and you need to know that the formula is this and i guess they will give you formulas when you go write the exam even if you're writing online they should supply with all sufficient information to enable you to write your exam without any hindrances okay that completes today's session any comment before i wrap up we have 10 minutes any comment questions for me it's fine excel for the excel and those tables that you say you'll be posting there because i think it's going to help a lot so with that it concludes our session for today just to recap on what we did we looked at the chi-square test for independence that you need to also make sure that you know how to state your null hypothesis and no alternative number your null hypothesis for chi-square test there is a relationship between two categorical or two nominal categorical variables or they are independent your alternative will state that there is a relationship the now no relationship alternative relationship or it will state that they are dependent then you need to know how to make a decision whether by using a p-value or a t-test and they didn't see any question in your past exam paper whether ask you to make a decision so more or less and i by scanning through your past exam paper as you can see that i've used different exam papers so you just need to know how to build a chi-square test what are the characteristics that makes up the chi-square test which means you need to know that you use a contingency table and you need to know how to what formula do you use to calculate the test statistic sorry my voice is going then you also need to know how to calculate the expected frequency because i think in the exam they might ask you to calculate the expected frequency because it's easy to calculate it's quicker so you just need to look at the table if the table does not have totals create the totals and then use the formula to calculate the expected frequency you also need to know how to identify the question or the statements given in the question in order for you to know which questions or which option you need to choose especially when you're looking at different types of tests because you will get same questions but for other tests let's say maybe they're asking you about or they gave you two numerical variables you need to know that those are two numerical variables those are not categorical variables and what else you need to know that's all that you need to know and then you should be good and i think i'm in your exam paper this is only one question one or two questions so that concludes today's session and before we leave like those who joined late just to give you an update so kim sent me or sent on the group and noticed with the preliminary exam date which says the 10th of september or something like that so it was early september which is too early because we haven't covered probabilities which will be next and i was going to only i was betting on only doing the basic probabilities and not doing the other probabilities and hoping that in the first week of september we can do the normal probability and then following after two weeks we do the discrete but now it means we need to push and look at other type of skills that you require in order for you to be able to write your exam so we'll have to have a exam preparation which would be a workshop and it might not be a two hour workshop it might be extended to a three hour workshop and i'm hoping that it can be on a saturday but i will communicate via whatsapp in order to arrange for that because during the week i am fully booked and there is nothing i can do because your module rotates with another module which is a stats module for the pure statistic second level module so i cannot also cancel that one because i also don't know when they write in their exam and they also do it by weekly so it means i need to find a special day for you guys even if it may it might be two consecutive saturdays i would not mind but i will communicate with you because on saturdays early in the morning i've got another stats class for all my first level modules so i'm very very sorry and for the inconvenience as well but we will get there you you will be ready to go write the exam and give them a hundred percent give them the moment under okay so any questions any comments quickly before we we pat ways thank you so much for the help we're looking forward to getting our distinctions because of your help you really are good we appreciate your time i do consideration about you that really will happen thank you so much yeah if there are no questions then have a lovely evening see you after two weeks bye