 really sharing. Thank you so much. Okay, thank you. Gonna start. Yeah, so today's session, we are going to look at the hypothesis testing comparing two samples dependent samples of dependent groups. By the end of the session, we should be able to know how to conduct a hypothesis testing for two related population or two samples. This comes from the same population because they will be related and the data that we're going to be using will be dependent or the variable that we're going to be using will be dependent on the other. Therefore, it means when you look at the statement given, you need to be able to identify whether the case that they gave you or the statement that they gave you, the variable of interest that they want you to test. Is it independent or is it dependent? If it's dependent, how do you identify that? Usually it's the before and the after situation that happens. So you will be able to see if they talk about before and after before you join the organization, after you have joined the organization. Before you get tested, after you do something and then you get retested, then you will have a case of before and after. And then we'll also do some activities. So just to recap, we know that we can do two samples, but one can be independent and we looked at the independent that is if you have two groups, male and female, primary and high school. So as long as you've got two groups today, we're going to be looking at related samples where it is the same from the same population, but we're looking at the before, the group before and the group after the treatment. So how do we do that? So for a related sample, like I've said, we will have to calculate the mean we have to test if the two populations are related and we're going to be using the mean because most of the tests that you are doing in this module, you always going to be testing for the mean. Then it means the sample that you have needs to be paired or needs to be matched. The measures are going to be repeated because you're going to do before and then you're going to also test again after. And we're going because of the before and after from the same group of sample, then we need to do different calculation to get one value or to get the difference of those paired values. The reason why we do the difference is also to eliminate any variations amongst the subjects. There are assumptions that also needs to be met. Both of the sample or the population has to be normally distributed. Or if they are not normally distributed, then the sample size has to be large. And when we talk about large, we're talking about a sample size of larger than big. In order for us to calculate the test statistic, which will help us in making the decision, there are a couple of things that we need to be aware of and we need to know how to calculate them. We first need to calculate the difference. And after we have the difference, we need to calculate the average of those differences. So we calculate the mean of the differences and that will give us the point estimate of the differences. Because we require the point estimate minus the mean value to calculate the test statistic. And then we also have to calculate the standard error, which is the sample standard deviation of the hypothesis B. So the standard error or the sample standard deviation will be calculated by taking the square root of your difference of your observation, not your observation, but your difference observation minus the mean squared. So it means you're going to go and take for every observation a difference, you're going to subtract that value from the mean. And then you're going to add all of them, then divide them back or take a square root of each value and divide by n minus one. And we're going to look at the example where n refers to the number of pairs that you have, which will be equivalent to your sample size. Okay. So to calculate the test statistic, we use this formula. T, because we are assuming that the standard deviation is not known for the population. So we're going to use the T test. And to calculate the test statistic, we use the difference mean, sample difference mean minus the population difference, divide by your standard error, which is your standard deviation divided by the square root of n. And remember, the standard deviation is calculated by the square root of your difference minus the difference mean squared divided by n minus one. And once we have the test statistic, we can also go and find the critical value and we know the critical value. We find it by using the T value with your level of significance or alpha and the degrees of freedom of n minus one. And when n is similar, it's your sample size. When making a decision at the end, after you have calculated your test statistic and you have your critical value, you then can make a decision for paired samples works exactly the same as with any other hypothesis. Because you will know in your hypothesis testing statement, you will always have the mean difference is greater than or it's less than or greater than or equal or less than or equal or equal. So you need to be able to identify whether you're doing a one tail test, which is a directional test, especially in the less than for your alternative. Or you're doing an upper tail or a one tail test, one directional way it is greater than or you're doing a two tail test, which is a non directional test where you will have two regions of rejection. Now the other thing that you also need to be mindful of is in the statement, there will be key words that you need to always remember. For a less than, they can use words like decrease or less or fewer, something like that. In the upper tail, they will use words like somebody's not muted, they will use words like they will use words like increase greater than or it's bigger than, it's superior, things like that, that will give you a sign that this will be the greater than. For a two tail or non directional, they will use words like change, different, as long as they don't give you a direction of less or greater, then you know that that is a two tail test. And the original rejection will be based on the sign that you have on your alternative hypothesis and when you make your conclusion and your decision, you based that decision and the conclusion on the null hypothesis, not on the alternative, but on the null hypothesis and you will either reject or not reject the null hypothesis. So let's look at the next example. Assume you sent yourselves people to a customer service training workshop and you want to determine whether have the training made any difference in decreasing the number of complaints and you went ahead and you collected the following data. You have your salesperson, you collected information about the number of complaints that each salesperson received before they attended the training and then they went and they attended the training and then you took the, you took the number of complaints after they have attended the training and you calculate the difference between before and after whether you use two minus one or one minus two, it's up to you. So I use two minus one, so before minus, sorry, after minus before and I get my difference for each salesperson. I need to also be able to calculate the other measures that I will require, such as the mean, remember, the mean is the difference, the sum of all the difference values divided by how many they are. So I'll take the sum of all these differences, which is minus 21 and divide by how many they are. I just count one, two, three, four, five, they are five divided by five and that gives me minus 4.2. So that will be minus 21 divided by five, that gives me minus 4.2. And then I need to also calculate the standard deviation, which is the square root of the sum of your difference minus the mean squared divided by n minus one. So it means for every line, I'm going to say minus two minus minus 4.2 and then I'm going to square the answer and I do for all of them and then add them all up. So I'll get it out there and then I will say five minus one, which is four. So I'll take the total that I got there. I'll take the asterisk total. So that total divided by four, take the square root of it and the answer that I will get will be 5.67. You don't have to know how to calculate this because I don't think they will expect you to do any calculations. They will give you the values. So I'm just showing I'm just demonstrating it how we got to the 5.66. So we have the mean of minus 4.2 and then we have the standard deviation of 5.67. Now we're going to first state the null hypothesis and the alternative always with the hypothesis. Our mean will be always equals to zero as we stated. So the population mean will always be, will be zero. So the mean difference between population one and population two will be zero because they are from the same population. So if we take population one population two because they are from the same population, they are going to give us zero. So looking at the information that we had previously, let's go there. They wanted to know if there is any of the decrease in the number of complaints. So already there we have our sign. So they want to check if there is a decrease or there are less complaints. So because of that then it means in our alternative hypothesis because we cannot put what the researcher wants to claim. We can only put it in the alternative. We cannot put it in the null hypothesis because in the null hypothesis it always contains an equal sign and a less than does not have an equal sign. So it goes in the alternative. So our alternatives will state that the mean difference is less than zero and we know that we have training and this is another additional information that they would have given you, which will give you the level of significance of zero comma one zero and we know that that will be our alpha and we have calculated the mean difference and when we go to the critical value we go and find the critical value. So our t alpha of zero comma one and the degrees of freedom of four is located at minus 1.533. So that is our critical value. We calculate our test statistic by substituting the value into the test statistic. Our difference, you remember the mean difference was minus the mean difference was minus 4.2 minus the population difference, which is always stated in the state hypothesis statement. It's equals to zero divided by your standard error, which is your standard deviation divided by the square root of n and your standard deviation, we calculated it previously, it was 5.67 divided by the square root of five and the answer that we get is minus 1.66 and from that we determine our region of rejection based on the critical value. So we know that our science said it's less than so we know that it is in the negative side, so we will put a negative on the critical value of 1.53 and we highlight where our region of rejection will be. So anything that falls in the yellow or whatever the color is that I'm seeing yellow or anything that falls in the yellow area we reject anything that falls in the white area we do not reject. So let's see where our 1 minus 1.66 falls it falls in the shaded area therefore we reject the null hypothesis and the decision will be to reject the null hypothesis and conclude that there is a significant difference in the number of complaints reasoning. So the before and the after shows a difference. This is another example of how you can do a hypothesis testing for two t-test, two sample of means on Excel. So if you remember our data that we have we had the five sales persons that before and after we took those data and we put it into Excel. In the Excel there is a data analysis tab but you need to activate it by using an Excel add in if you don't have that and we use that to calculate the test statistic and the output that you get is this and on that output it gives you the mean of it gives you your it gives you the measures for both the pre and the post so this will give you the measure of central tendency or location the mean and it gives you the measure of variation the variant and it also gives you the observation for each so you can see that the observation is five then it also goes and and say what the hypothesis mean was so this will be your your null hypothesis statement and your degrees of freedom remember it is n minus one and we did find that n minus one which is five minus one is equals to four you can see that on excel output it also gives you that then it goes and calculate the test statistic as you can see there is the test statistic now the thing that you need to be aware of is your test statistic will come back positive on excel but you need to be mindful of the fact that depending on your critical value as well you need to make sure that your uh test statistic it also takes the sign of the um sorry of the hypothesis the null hypothesis design that is allocated to the null hypothesis as well because if you look at this and you're going to use the uh test statistic as it is and you're going to assume that it's positive you are going to not reject the null hypothesis okay so uh when looking at the critical value for one tail test then we get the critical value for a one tail test and we get the critical value for a two tail test so since we were doing a one tail test as you can see there is our critical value and remember also the sign set less than so we're going to put a negative in front as well so now the other thing that you can also get which I haven't touched now is the p-value that is very important because most of the time in your exam or assignment they will ask you about the p-value so you also need to be mindful of that that for a one-sided tail the p-value is 0,0866 but for a two tail a um a p-value will be double so if I take if I take this value and I multiply that value by two I will get the two tail p-value because of the two regions of rejection it shares them if I'm given a two tailed value and I want to go to a one tail my arrow went way too ahead and I want to go there I need to divide by two because I need to split those two areas into two so that I can get the value for one so that is the thing that you need to be aware of and always be mindful of that if you are given the p-value for one and they are asking you to calculate the p-value for two tail you need to multiply by two if they have given you a two tail and they ask you to calculate one or give one p-value or a p-value for one tail test then you need to you will need to remember to divide by two okay so and that concludes me talking and now it's your turn to answer some of the questions are there any questions anything that you are missing from my explanation was it clear was it difficult or we can just dive in dive in into the activities okay so let's look at the activities question number one a matched pet t-test should be used when you are number one testing a two tailed hypothesis comparing means on a measurement from before and after a specific event comparing two variables which come from the same group comparing two means on a variable where data were drawn from the same population which one one two three or four can also write on the on the check if you do not want to speak out check the check for any responses is it option one two or three four think about what we just discussed today whatever I said a lot about dependent groups nowhere okay the answer would be option yeah my basis would be two because it's before and after yes it's option two number b two samples may be regarded as independent when so this is what we discussed last time I just want to see if you are able to differentiate what we learned today and what we learned last time dependent variable means there is a some relation between the two independent means there is no relation between the two there is nothing affects the other the other one doesn't affect the other when dressed with the other one the other thing affects the behavior of the other something like that so would it be option one that will be option one there also when you look at this think about everything that we discussed today right but let's see you need to be able to identify the key words in the question as well a sample of 70 people are tested on a test for assertiveness before and after a workshop in which they are given assertiveness training which of the following is most appropriate formula for comparing the mean assertiveness score before the training went the one after the training think about the formula we used today okay is it one two or three this one I can just also add the zero maybe probably it's that's the thing that is confusing you zero or mean d because they didn't because it's equals to zero anyway so the formula we used today would look somehow like this like this minus i'm gonna put the minus mean d and that's supposed to be a subscript not a big thing mean d so is it one two or three let's identify what is that that we need to look at the before and after so it will be option three so let's recap on this one so this is for when you have one sample it's hypothesis for one sample we did this I think in april already for one sample and this we did it I can't even remember when we did this one but this is for independent samples two groups and this is for dependent two groups as well right so you need to be able to differentiate which one is which and then remember with two groups it will be male and female something like that for one group it will just be one population and they will give you the mean of that and the standard deviation and then just calculate let's look at number three I think this will be the second last they will not be the second list to test the efficacy of a workshop aimed at improving people interpersonal skills a researcher applies a scale which rates the interpersonal skill of 20 participants before and after the participant in the workshop scores on use the rating scale before the general population have a mean of five and a standard deviation of 1.5 which one of the following is the most appropriate way to express the null hypothesis for this analysis what's very important is the before and after for sure I can tell you we do not represent any hypothesis testing using the point estimate because the point estimate is for the sample so therefore this one would not be correct and because we dealing with two groups we cannot represent like this this is for one group so that is not the right so we are left with two statements null hypothesis one says the mean is equals to the population mean of the other number four it says the mean is not equal to the population of the other now think about it in a high in a null hypothesis there is always an equal sign so so it will be equal less than or equal or greater than or equal will number four be one of the option yes that will be option option three is the correct answer because this one it null hypothesis can never have and never have a not equal a null hypothesis can never have a not equal the only signs that you can get in a null hypothesis is equal less than or equal or greater than or equal so when you see signs like less than not equal or greater than in a null hypothesis you must know that that is not the correct way okay let's go to question four which is our last question and then we're going to call it a day a researcher wants to test the following hypothesis the null hypothesis states that the mean is equals to the population mean one is equal to the population mean two and the alternative states that the mean population one is not equal to the mean population two if i can ask before i read the entire statement by looking at this sign is this a directional or a non-directional it would be non-directional it is a non-directional test okay on the basis of the data provided the output from a computer program indicates that a t value of t is equals to 1.72 was found so like i said previously they will not expect you to calculate the value of t they will most likely give it to you they will not expect you to go and find the p value they will most likely give it to you so on the basis of the data provided the output from a computer program indicates that a t value of t is equals to 1.72 was found with the p value for a two-tail test was given as 0.056 a two-tail test so that is very important to also remember because we're doing a non-directional two-tail test there because of that equal not equal sign and we have our p value what should the researcher do to evaluate the results at a level of significance of alpha of 0.05 number one this is the thing that this is most likely the checking if you understand how to use the p value so let's see what should the researcher do to evaluate the results at the level of significance of 0.05 number one he says divide the p value by two before comparing it with the p value number two multiply the p value by two before comparing it with the level of significance alpha number three says divide alpha by two before comparing p to z uh level of significance and number four says compare the p value as given now and you need to understand with that question is are you able to from the information given are you able to make a decision definitely yes you are able to make a decision without doing any calculation without doing anything by just comparing the value of p value and that the reason i now i'm giving you that the reason why i'm saying that is because the statement says what should the researcher do to evaluate the results at the level of significance it means what does the researcher have to do to make a decision and how do we make a decision because we know that this um our hypothesis testing our alternative says it's a two-tailed non-directional test therefore it means the p value if it's given to us uh and the p value is a two-tailed p value then there's nothing we need to do we just need to compare the p value and the level of significance and make a decision why cause the room for decision using p value says if the p value is less than or equals to alpha we reject if the p value is less than alpha not less than or equals to we reject the non high hypothesis that is the decision that is what the decision rule says now they've given you a two-tailed test they told us that the null hypothesis and the alternative hypothesis is a non-directional two-tailed test our two-tailed test they gave us the p value so what we can do with that is take the p value given the p value as given and compare it with that so we're going to take our p value of zero comma zero five six and compare it to our level of significance of zero comma zero five and determine if it's greater than so you know we can see that this p value is greater than or equals to or it's greater than uh the level the level of significance so therefore we do not reject but that's not what they're asking you they're asking you the step that I just said we just compare in the two values so it means option number four is the correct answer now here's the catch if if they would have given you and they would have said if they would have said the p value I'm gonna change all this statement let's remove only one piece of it if they would have said with the p value of one tail test given that the p value and I'm going to change and swap these values and they give you the value of zero comma zero three eight let's assume that that is the statement that they have given you when you think about it you go back to your alternative statement and state but my alternative statement says not equal therefore it is a non-directional which means it's a two tail test given that it's a two tail test then what the research needs to do to make a decision you will have to make a decision by looking at the p value so because this is a two tail test therefore your p value will be equals to remember I said to get a two tail test p value you're going to multiply the value of a one tail test to get the value of your p value and then you come and evaluate now which which option would that be it would have been option number two because option number two says you multiply your p value by two before comparing it to the level of alpha so we're going to do this step and then we come back to this step so if you are given a one tail test you're going to multiply that by two to get a two tail test p value and then evaluate or compare the two value the p value and the level of significance that is one scenario what if the other scenario was like this i'm going to give you another scenario i'm going to put everything back everything back but the scenario here changes to instead of using that sign let us assume that they said it is less than a day so if your alternative is less than therefore it is a directional test and it is a one tail test and they have given you all this information and they tell you that a two tail test it's p value it's equals to zero comma five six and the question is what should the researcher do to evaluate the results at a level of significance of so the researcher needs to make a decision does he have all the information required to make a decision definitely yes but is it sufficient to make a decision no because it's a two tail test p value in order for the researcher to compare the p value to the level of significance they need to find the p value finding the p value because it's a two tail they're going to take zero comma zero five six divide by two and this i explained uh with the example they uh on an excel thing so that will be zero comma zero two eight but this is just explaining that you will get the answer there for the one tail test because you want to make a decision for a one tail test and then come to that step and compare and make a decision but the appropriate answer would have been option number one so you just need to be very careful when you read the questions as well evaluate what you are given make sure that you understand the question look at the option and make sense of what is required to answer that question because you can see that with different scenarios you could have gotten either option one two or three as an answer all in the detail so with our question coming back was a two tail two directional the only thing that is required the statement in the non hypothesis compares to the two tail there then we just take the p value as is because it's a two tail already calculated given p value and the level of significance and make a conclusion compare the two and make a conclusion that means option four are there any questions other than if there are no questions so we can end the session any today we'll just end right there the absence of questions just to introduce again to if some of you are new and you don't know I do this from pambili analytics and our aim and mission is to close the gap in terms of literacies when we're looking at data numeracy statistics research especially the analytical skills and we offer a range of services that you can also benefit from more especially in terms of our interaction lessons or sessions we offer instructor led online training similar to how we're doing it right now at the moment we're running a special which I think it will be this month will be the end of this month and in June we revet back to our normal rate for specials which is the normal rate is 350 per hour and if you want access to the recording the notes you have to sign in as a member and join the channel as a member when you join the channel are very important before you sign up look very carefully on the membership that you are joining on only the membership that gives you real pecs are from loyalists about to promoter otherwise you can subscribe to the channel share the channel like our video comment on our video so that we can improve the content as we go along other than that thank you for coming and if you need to get hold of us those are our content details enjoy the rest of the evening I will see you on sat on saturday bye you bye bye thanks I'm not saying you