 Welcome to this week's session. I'm not going to cover a lot because today's session, it's a combination of two things. But it builds up on the session we had the previous time we met. We're still going to look at hypothesis testing. When we compare into population or two samples. By the end of the session, you should be able to know how to do your hypothesis testing for two independent population. Also for two related population and we're doing hypothesis testing of the mean. For all the examples of today. I do have a little bit of activities, but because the session might take long. So you won't have any activities at the end of the session. And I think we might repeat some of the activities the next time we meet or we will build on some of the activities. Okay, so but today I just wanted to make sure that you understand the concepts and you are able to. When you come across some of the questions relating to hypothesis testing for the mean when you have two population. You are able to identify that you have two independent groups or you have two related groups. And we're going to touch on those two scenarios just like just charging my laptop. Okay, so in terms of the two sample tests that we're going to be looking at, like I already explained, you need to be able to identify whether you have two independent samples, which means you have two groups. It can be two variables, boys and gals, males and females, single and married, South Africans in Barbway, things like that. So those two are independent. One has no bearing on what the others is doing. So they are not related to one another. Or when you have two sample tests that you are doing and you have related samples. So it means these samples come from the same thing. So it's situations where you have the before and the after you the control and the treatment. Yes, control and treatment. So you write the test and then you get a lesson and then you write another test. So it's the before and the after type of a test that we're going to be looking at. So it means one has influence on what happens to the other. One is related to the other one. One is dependent on another thing happening as well. So you need to be able to look at the question and identify those. Are they related or are they independent? So in terms of the hypothesis testing, there are several things as well that you also need to identify in the statement and in the question whether those are assumptions that are made during the hypothesis testing. So remember with every hypothesis testing, the goal is to test if there are differences between the groups. So if there are two population groups, then we are testing whether there is some differences between population group one and population group two. The other thing you need to make sure that you understand the assumptions. Given the population standard deviation is known, the population standard deviation of those two populations. Or are they also equal? If they are known and they are equal, you need to be able to know what kind of a test you're going to be doing. If they are unknown and you assume them not to be equal or type of a test you need to be doing. There are different tests that you can apply. So not the same for everything. And when we look at the differences between the two populations, always remember your point estimates which are the values that you're going to be calculating when you calculate the test statistic. We use the sample statistics in the hypothesis testing when we state the statement in your non-hypothesis and the alternative hypothesis you're going to use your population parameters. When you are calculating your test statistic, you're going to use your point estimate which will also tell whether there are differences between the test statistics or the sample statistics that you are calculating. So what do we mean by independent samples? So independent samples are samples that are unrelated to one another. They are independent from each other. One does not influence the other one. One has no bearing on what happens to the other one. Now when you do the hypothesis testing like I already explained in the beginning, there are some assumptions that needs to be met before you can carry out your tests. Number one, the sample needs to be randomly and independently drawn for you to be able to do a hypothesis testing for the independent samples. Your population needs to be normally distributed. If it's not normally distributed, then there are other tests. We call them non-parametric tests that you can do, which is not part of your syllabus. If they are also not normally distributed, but at least your population needs to be greater than 30. Your sample size, sorry, your sample size needs to be at least large. So you need to work with a large sample size. Your population variances, if they are unknown, but they should be assumed equal, then there will be some type of analysis that you do. So let's look at that. If your population standard deviation is unknown, but they are assumed to be equal, then we're going to use what we call a pooled variance, which is denoted by an SP. You will see how we calculate the pooled variance t-test. If your population standard deviation is unknown, but it's not assumed to be equal. So it means that the population standard deviation of the two populations that you are working with, they are assumed not to be equal. Then we use what we call a separate variance t-test. So now, usually in your studies, you are going to be using the pooled variance t-test. You're not going to use the separated variance t-test as often as possible. You will see when we do a lot of other activities as well. So hypothesis testing for the mean, whether population standard deviation is unknown and not assumed to be equal, like we said, also the assumptions are the same. For that one, samples needs to be drawn independently. The population needs to be normally distributed, or the sample needs to be bigger than 30. And your population variance might be unknown, but cannot be assumed to be equal. And that will also be one of the tests that you can do. It's just a slight disreputing what I've just said. Now, how do we test, how do we state the hypothesis testing statements? Remember, the previous time we looked at the six steps of hypothesis, we looked at how do we state the non-hypothesis, the alternative hypothesis. Those same concepts are still applicable here. Now, when you read the statement, sometimes in the statement, they might give you some hint in terms of the type of a sign that you need to allocate in your hypothesis. Whether you're doing a two-tailed test or a one-tailed test, what do you call a directional test or a non-directional test, which is a two-tailed test. Where does such S decrease? If in the sentences, in the statement that they gave you, they say another population is decreasing, then you know that we're talking about the lower test. So it means your non-hypothesis will state that the population mean one is greater than or it's equal. Remember, in your non-hypothesis, there's always an equal sign. So it will state that the population is greater than or equal the population two. Your alternative will state that your population one is less than the population two. Alternatively, you can write the statement as such. Your non-hypothesis will say there is a difference or the difference between the population one and population two mean it's greater than zero. And your alternative will state that the difference between population one and population two is less than zero. And that is for decrease, decline, less than, below all those kind of weights. So immediately when you hear weights like that, know that in your alternative statement, you will have a less than. Remember also, the sign that you put in your alternative hypothesis will help you with understanding how you make a decision, where to find the critical value and how to make a conclusion and also how to find your Z critical values or your true critical values and so on. For an upper tail, statements will read increase, more than, greater than all those statements. They will refer to where you are testing for the one directional upper tail test and the statements you can see there. For a two tail test, the statement might read there is a difference, there is a change, they are the same, things like that. So when you find weights like difference, change, same, not the same, not equal, then you know that you are doing a two tail, which is a non directional test, because it's got two regions of rejection as well. Your non hypothesis will say the mean are equal, the alternative will say the mean are not equal, or you can say the difference between the two means are equal to zero, and the alternative will say the difference between the two means is not equals to zero. How we're going to make a decision will be based on the alternative side. So if it's a decreasing, remember, lower side, less than decreasing, then your region of rejection will always be on the left hand side, and it will be on the negative left hand side, or it can also be on the positive left hand side, but the region of rejection will always be on the left hand side. And to make a decision, you will reject your non hypothesis, if your test statistic, which you will calculate your T test statistic, if it's less than the critical value in the negative side, you will reject the non hypothesis. For the upper tail way, it is increasing, it's greater than, if your test statistic is greater than your critical value, you go into reject. For a two tail test, you have two regions of rejection. If your test statistic value you find falls in either one of the areas of rejection, you reject your non hypothesis, otherwise you do not reject your non hypothesis. Remember the six steps? Always, always, you need to follow the six steps because in your, in your exam or your assignment, usually the options that they're asking you, they might be asking you about the one of the steps of the hypothesis. So you need to know the other because they build on each other. The first one is you need to know how to state your non hypothesis and your alternative hypothesis. Always remember, when you state your non hypothesis and your alternative, we use the population parameter. Number two, always remember that in your non hypothesis, it's always an equal, there is an equality sign. So it means it's less than or equal, greater than or equal or equal or sometimes it's just an equal. Your alternative does, will never ever have an equal sign to it. It will either be less than, greater than or not equal. It will never have a less than or equal or greater than or equal or equal in it. You need to know how to make your decision. Baking your decision is based on several things. You need to know your level of significance, which is your alpha value. You need to know your critical value. You need to know whether you're doing a Z test or a T test. All those things are very, very important. And now remember, when you're looking at unknown standard deviation, we use the T test. For known standard deviation, we use the Z test. You always need to also remember that. Step number three, you gather and calculate your test statistics. If we're doing a T test statistic, you also need to know how to calculate the test statistic. You need to substitute the values correctly. Remember, we use the difference because we're talking about independent samples. The difference between the sample. So remember, we use the point estimate or the statistics, which are the difference between your sample means divided by the standard. Here we're going to use the proved variance. And we're going to look at that formula just now, but you need to know how to calculate that. Then lastly, you need to know how to make a decision. And making a decision, remember, based on your region of rejection, which is based on the critical value and your test statistic. And always when you make a conclusion, you refer back or you refer back to your null hypothesis in state of your alternative statement. So always your conclusion is made from your statement made in your null hypothesis. Let's look at how we do a hypothesis test for independent sample where we do a T test. So the first step always remember the six steps or the first steps, the first step to state the null hypothesis. So your null hypothesis will say there is no difference. That is your null hypothesis. So your null hypothesis always say there is no difference. It means the two means are equal. So remember, you can write it as such or you can write it as your H naught. Your mean treatment minus your mean control is equals to zero. You can write it like that. Doesn't have to be mean equal mean. Your alternative will say there is a difference. So your alternative always say there is a difference because we want to prove that phenomenon that we said there is a difference. There is no difference. So the alternative, the mean treatment is greater than the mean control. And here is if we are doing the before and the after. I'm just using this as an example. Number two, define the decision method. We know that we're going to be using the T test statistics. So that will be how we're going to make a decision. We're going to go to the critical values of T. We're going to calculate the test statistics of T. And when we go find the critical values of T, we're going to use your degrees of freedom and your level of significance. But you don't have to worry about knowing how to get those points. Because usually they don't give you the T test table. Then you need to gather and compute. So it means you need to identify the values that you need or the facts that you will need to use to calculate. So the test statistics says the mean, the sample mean minus the sample mean one minus the sample mean two minus the difference of the population mean. Now, usually in the formula, this is always equals to zero. So we don't even have to write it. So divide by the square root of your sample variance. Divide sample variance of one divided by the sample size one plus your sample variance two divided by the sample size two. So usually the test statistics, if we're using this will be T stat is equals to the mean one minus mean two divided by the square root of S21 divided by N1 plus S22 divided by N2. That's how we write it because the hypothesis testing said the mean difference is always going to be equals to zero. So you don't even have to write the formula in this manner. Okay, making a decision. We know we use the region of rejection, which is the critical value will tell us if it's a one thing. If it's an in the lower side, it will be on the left hand side. If it's on the upper side, it would be on the right hand side. If it's a two thing, an undirectional test, it will have two regions of rejection. If the test statistics that we calculated in step number three, if it falls within those two reject regions of rejection, we reject the null hypothesis. Let's look at an example. We are interested in whether the type of movie someone sees at the theater affects the mood when they leave. We decided to ask people about their mood as they leave one of the movies. So the people looked at the first group looked at the comedy and the second group looked at the horror. So group one had 20 sample size and group two that looked at the horror also had 20 sample size of people. So both of them had 2020. Our data was recorded so that the highest goal is indicated a more positive mood. So it means after they have watched the movie, they ranked the movie so that they can score based on how they feel after the movie. So the higher the score indicate a positive mood. This group one sample size, we already made connection there. There are 20, there were 20 people. We calculated some sample mean of the scores and we found that group one, the score was 10.6 and group two, the score was 6.15. The standard deviation, which means the variation between the scores of group one was 3.20, the variation between the scores for group two was 3.18. We have a good reason to believe that people living the comedy will be in a better mood. So we use a one-tail test at alpha of 0,05 to test this hypothesis. So there are a couple of things that they have given us they given us the facts or the stats which we're going to be using. They also told us that we need to be using a one-tail test. So already they are telling us we're doing a one-tail test and yeah, they're talking about better mood. So better mood might be greater than. So that is also another thing that they are giving us here as a test or as some of the facts given in the statement that we can also gather. They also giving us the level of significance which is very important when we go find the critical values and so on. So let's test this hypothesis that people living a comedy will be in a better mood than people in a horror movie. So we're going to test that. So the first step of everything is to state your null hypothesis and your alternative hypothesis. So we state the null hypothesis. The population mean of those from comedy is not different to those that are in horror. So there is no difference between the population mean of comedy and the horror. The alternative will state that there is a difference between but because remember there we said, oh, sorry, let's go back. Better mood means greater than. So how we state that statement going to say there is the mean of the comedy is greater than the mean of the population. Of those from the horror movie. So we're going to use a greater than sign instead of the not equal. So we're not going to just say it is there is a difference. So we're going to say it's greater than the other one as well. So you must pay the difference. If it was not equal, we would have just said that the difference between the two groups, the mean of the two groups is different. But because it's not just different, it's greater than we say the mean, the difference between the mean of the comedy to those that watch the horror, it's greater than zero. So. We also can go to step number three or step number two and determine what kind of a test we're going to be doing going to be doing a tea test. And we also identify the facts we write the facts we go went and found the critical value remember like I said, in your studies, you don't have to worry about finding the critical value they can either give it to you. They will not ask you to go and find it because they don't give you the tables to go find the critical value. So we have our critical value which helps us to define our region of rejection. And because of this sign, so our region of rejection because of the sign, it's greater than then it will be on the positive side. So it means one comma six, eight, six is our region of rejection. Anything that falls on the right side, we're going to reject the non hypothesis. We went on we calculated the test statistic, which is the sample mean of comedy minus the sample mean of horror. We know that it was 10.65 minus 6.15 divide by the. Remember we use the pooled variance right so they. The variance. Which is the standard deviation of 3.0 squared, which gives you the variance divide of the sample one divide by the sample size of sample one, plus the variance of sample two divide by the sample size of sample two. And when you calculate that, you get 4.6 4.461. So after we have calculated the test statistics, then now we are ready to move on to making a decision, which is step number four to make a decision. We draw our picture. We write where our critical value is anything that falls on the red shaded area. We're going to reject an alpha of zero comma zero five anything on the white area. We do not reject the null hypothesis. Our four comma four six one falls on the shaded area. Therefore we can conclude and we're going to make a decision and reject the null hypothesis at alpha of zero comma zero five and conclude that the average mood after a comedy is better than the mood after the horror movie. And that's how you make the hypothesis testing. Process. Okay, so. Are they any question. Before I go to the previous. Sometimes. Sometimes you can also because on this one we use only the critical value and the test statistics. Sometimes you can use what we call the P value. So when you use the P value remember always the decision says if your P value is less than your level of significance, which is your alpha value, you reject the null hypothesis. And whether you're making a decision based on the critical value and the test statistic or on the P value and the alpha value. You need to get to the same conclusion. You make a decision based on the P value. So some statistical software they will go in. They are able to calculate the P value and give it to you. For example, Excel is one of the simplest tool that you can find and you can use to do this kind of a test. But you can plug in the values and calculate the P value and your P value of 0,0007. The rule says if the P value is less than alpha, we reject the null hypothesis. So now our P value is 0,0005 our alpha value was 0,005. Therefore, we can reach the same conclusion. We reject the null hypothesis and 0,005 and conclude that the average mood after the committee is better than the mood after the horror movie. So you can still reach the same conclusion. And with the P value, you need to be very careful as well. So I'm going to black out the screen. With the P value, if it is one direction, let's call it a directional test, if it's directional test, then your P value is the value you see. You fight. Remember, this is where you have your statement of alternative. If you're dealing with a non-directional, which is a two-tail test, then your P value will be two times your P value. Therefore, what do they mean? If, for example, they tell you that you have a two-tail test, you have a two-tail test and the P value is equals to 0,25. I'm just going to say that that is your P value. And they say, what is a one-tail test P value? If the question says, we've done a test and the P value for a two-tail test is 0.25, what will be the one-tail test P value? That is easy. You just say, because a two-tail test, which is a non-directional test, is two times the P value, therefore it means if this is your P value for a two-tail test, therefore a one-directional has two of them. So we're going to divide this. It's made up of two of them. So you just divide a 0.25 by 2 and that gives you 0.25 divided by 2 and that gives you 0,125. Remember, because for a non-directional, therefore it means this area and that area, this one is 0,125 and this area is 0,125. But both of them, they make up a P value of 0.25. Always remember that they like to treat you with questions like this in their exam. Be mindful of that. Okay. I'm going to go back to my presentation. Now let's look at this activity. Because we're going to run out of time next on Tuesday, we're going to do hypothesis testing for related samples so that we have more time for that. So let's look at the following exercises. Consider the following statistics regarding post-training attitude score. We have two groups, group one and group two, and they recorded their mean average and their standard deviation. The first thing you need to notice in the statement is what they said right there. They said consider the following statistics regarding the post-training attitude. Please give me one second. I need to open for my daughter. She just came back from school. Just give me one second, one second, one second. Apologies for that. You say the hint here is the weight statistic. Because on here they're not talking about the population or anything, just give you the hint of what to follow. So they say the following statistic. So what you need to remember from our first session that we had that the measures that comes from a sample are called statistics and the measures from a population called parameters. Remember that, right? And remember statistics are also called point estimates. Those are the things that you need to always constantly remind yourself with because they can use them interchangeably. So they gave you the mean and the standard deviation. The question here is what are the values in the table called? Are they called population parameters? Are they called sample statistics? Are they called test statistics? Is it option one, two or three? You see how tricky they can make the questions look like? That's a question for you. Is it option one, two or three? No one wants to try. Okay, so based on the information I just gave you, Karen. Who would the answer be two? The answer will be two because they told you the following statistics regarding post-training attitudes goes as follows. And then they are just asking you what are they? So you just need to look at the mean, the standard deviation. These are the measures. Remember, these are what we call measures. These are measures of central location and measure of deviation. Remember, the mean is a measure of central location or central tendency. The standard deviation is your measure of dispersion or variation. So the measures that comes from a sample are called statistics. So sample statistics are those measures or the values in the table. Exercise two, we're also still following the same question, right? It says, consider the following statistics regarding the post-training attitudes go from group one and group two. You have the mean and the standard deviation. Group one has the sample size of 20 and group two has the sample size of 20 as well. And they say, what is the value of your test statistic? So always remember what the test statistic is. The test statistic is that formula that you need to calculate. Your t-stat, which is your sample mean. Remember, this is also denoted by the sample mean and this is denoted by an s. Also remember the symbols. The sample mean for one minus the sample mean for two divided by the square root of your variance. Now, your variance is your s squared, right? Is your standard deviation squared divided by n or I must put the subscript n1 plus the variance two. So we just substitute. Remember, this will be your x1, x1, x bar 1, x bar 2. This will be s1. This will be s2, sorry. s2. That's all because this is for group one. This is for group two. Always remember that. Now for mean is 21.65 minus 20.40 divided by the square root of 2.99 squared divided by 20 because they told us n is 24 group one and 3.05 squared divided by 20. Now, usually I have a cashier calculator on my laptop to show you, but if you don't have cashier calculator, you have to calculate this step by step. So what I mean by step by step is you're going to calculate the top patch and get the answer and put your answer for the top part and then calculate the bottom patch. Do the 2.99 squared divided by 20 answer plus the 3.05 get the answers, the squared get the answer divided by 20 and then add them together, get the answer and then take the square root. When you do that process, do not round off. Keep all the digits as you write all the numbers down and then once you have the number for the square root, take the top number divided by the number from the square root. Don't try and do everything all at once. If you want to do everything all at once, use brackets. Unfortunately today, I will check maybe on Tuesday, I will show you how to use your cashier calculator if you didn't know how to. Or if you are using another calculator that is not a fraction calculator for cashier. But for today, I will answer this question for you. It's 21.65 minus 20.40 because I'm using a cashier with a fraction. Functionality makes my life easier. 2.99 squared divided by 20. And I'm going to show you so I will also advise you to maybe if you've never bought a calculator, try and find a cashier calculator. They sell them at pick and pay, at shop right, at checkers, at books, at CNN, all the bookshops. Don't buy a too expensive one. At shop right, I think it's 272.325 now, but sometimes they have it on special. And if you are not sure about it, you can even ask before you go and buy. And my answer from my calculator. If, for example, I'm not sure if you will be able to see it, it might not be visible. It's 1,3. I'm going to keep only three decimal 1,309. I'm just going to keep the three decimal because I'm already at the end. I'm already at the end. So looking at the information I have in front of me, the answer is option or I can keep one decimal because I could also look at the options. What do they give me? So the option is 1,3. So even if I round this to 1,31 or to 1,3, the answer would be just option number three. But you need to know how to calculate this because these are some of the questions that comes from your past exam paper, your tutorial letter. So they will expect you to know how to do these calculations, but we can work through them together as well. We are left with two minutes before the end of the session, like I said. Let's look at the last question I think is the last question and the rest of the other activities will do them on Tuesday just to seal off the hypothesis testing. Suppose the two-tailed p-value for the test, t-test of the differences between two means in the previous question is 0,19. Now remember, we get this type of an example previously. The key word, two-tailed p-value. So let's remember for a two-tailed p-value, that value has two regions of rejections within it combined. So they say the p-value there was 0,19. If alpha is set as 0,10, what is the decision regarding the null hypothesis? So on this one, they are asking you to make a decision. Now always go back to what does the rule says? What is the decision rule? So the decision, the decision rule always says if the p-value is less than alpha, reject the null hypothesis. Now knowing that you know that this is your p-value. Okay. If your p-value is 0,19, your alpha value is 0,10. Which one is which? Is the p-value greater than or is it less than? So your p-value of 0,19 is greater than your alpha value of 0,10. So knowing that, therefore we do not reject the null hypothesis. That is the decision. Not the rule, the decision. This will be the decision. So now, which option will be the right option? Number one says do not reject the null hypothesis because 0,19 is greater than 1,0. Number two says accept an alternative because 0,19 is greater than 0,1. Number three says reject the null hypothesis because 0,095 is less than 0,1. Always remember that we make the decision based on the null hypothesis, never on the alternative hypothesis. So this will be with the process of elimination, that will not be even correct. Even though it looks like it's correct, but because they use the alternative to state the decision, then it makes it not acceptable. And this one you can see that using a different p-value, therefore the only correct answer is number one, but also is the same thing that I just said when I looked at the decision. I could have put do not reject the null hypothesis because 0,19 is greater than 0,10. But I started first by looking at the p-value comparing it to the alpha v and then make my decision. Easy, isn't it? Easy, easy, easy. Easy stuff, easy stuff, easy, easy stuff. Are there any questions? We are at the end of an hour, even though we started late, but I don't want to start a new section when we left with few seconds. And I hope with all this, at least it opened up some of the things that you were unsure of, you are now aware of what is happening. Are there any questions? Any comments? If there are no questions or comments, then we can conclude today's session. For those who, I will make two scrolls, scrolls, scrolls, scrolls, until I get to the end. For those who are not aware, I am not part of UNISA, I come from Pambili Analytics. My role is just to support you within your studies by offering you these free services or free online sessions. At Pambili Analytics, our mission is to bridge the gap where it comes to data literacy, statistics literacy, numeracy, but more specifically analytical skills. On the other hand, we offer all these sessions, but we don't only do that, we offer a range of services, including consulting, as well as the skills development. If you want to have one-on-one discussions, you want assistance, we do offer instructor-led one-on-one training, either virtually or face-to-face. If you're running a special, you can sign up. It's 150 per hour. By June, we're going to revert back to the normal price. I think we've been running the special for almost three months now. If you are interested in programming, learning how to do research, we also have self-led training sessions. On the research methodology, research design, guided learning, you can take those courses and equip yourself. Remember, you can subscribe to my YouTube channel. I do have a YouTube channel. You can subscribe, like, share with others. There are plenty of videos which are free of charge that you can watch and binge watch on them and use them as you prepare for your assignments or your exam. Otherwise, if you want the recording of these free online sessions, the recordings you can only get access to them if you sign up for a membership on YouTube. And those are the membership. And I would like to end the session by giving shout-out to our lawyer list for the last month, which are Bambo Koji and Izal. I will say thank you for being part of the session today. If you need to get hold of us, these are our content details. Other than that, adios. Enjoy the rest of your weekend and see you Tuesday. Bye. Thank you. Goodbye.