 Welcome to your sixth session of statistical inference literacies and today we're going to look at how we employ the different non-parametric tests. Today is the 17th, we're dealing with the non-parametric tests and then on the 24th we will be looking at how we solve simple linear regression and correlation and then the last session will be on the day first, we'll look at the time series and then we would have concluded everything relating to content for STA 1502 and then we can look at exam preparations as well. So there's not a lot to go on, only the last two sessions then we will be done. So if by tomorrow or by next week there are not a lot of people also joining then I don't think we will have the exam preparation, this will be the 31st of May will be the last session that I will host and then if you just need any help then we can discuss that outside of the sessions. Okay, any question or comment or query before we start with today's session? Nothing is it. So the first one that we're going to look at is the Wilkinson ranks some tests for difference in two means or medians. The things that you will require for you to successfully complete all this are your statistical tables, your formulas and a calculator. By the end of the session you need to or you will learn how to use Wilkinson ranks some tests for two population medians and also how to present the Wilkinson sign ranks test for comparing two page samples. So with Wilkinson ranks some tests, there are several things that you need to be aware of. The test has to be of two independent population medians. The populations needs to be normally distributed and the distribution has to be free procedure distribution and we use it only when there is rank data available and we must always use normal approximation if either of the sample size is larger than 10. Some of the things that we need to be aware of is that we need to use it when both your sample size 1 and sample size 2 are less than 10 and we assign the ranks to determine which one will be your N1 and N2 samples observation and if unequal samples then N1 which will have the smaller size sample will be referred to or the samples that will have the smaller size sample size will be your N1 and that will be our smallest rank value of 1 and our largest value rank will be N plus 2. Then when there are ties, let's say there are two position ranks, the values that are the same, let's say it's 30 and 30 they follow one another, what we do in terms of that then we take the rank ties so it means you're going to say that other one is 3.5 and the other one is 4.5 something like that so it will take the rank ties and the sum of the ranks for each sample so you will have sample 1 and sample 2 so for each sample you will also label them as T1 and T2 so those will be your ranks and our test statistic we will refer to the T1 which will be the ranks calculated from the small sample, you must always remember that our N1 will always represent the small sample so if we have two samples just if we have two samples, sample sizes, let's say we have the first sample size N1, if N1 is equals to 11 and N2, sample size N2 is equals to 9, what just give me a second, I just need to just hold on, sorry about that it was my mother I thought it's something agent I must apologize for that so what I'm saying here is if our sample size is N1 is equals to 11 and N2 is equals to 9 according to the Wilkinson it means then we need to reverse them because your N1 needs to be the sample size with the smallest values so this will be our N1 and this will be our N2 and that's how you will assign your sample size numbers and then when you calculate the rank from those values therefore it means your T1 that we will calculate the sum of the ranks that you will calculate from the sample size N1 that will be our test statistics and that will be the value that we use to make a decision so in terms of checking the ranks we use this formula because it is the sum of the rankings and it must satisfy this formula before below action and we take your ranks for T1 plus the ranks for T2 should be the same as your sample size both for sample size 1 plus sample size 2 times the sample size 1 plus sample size 2 plus 1 divided by 2 and that is why you don't have to repeat N plus 1 N plus 1 twice so we can represent them by the sample size N and they need to be the same so if I add the ranks they should be the same as if I add the sample sizes on the other side and my electricity is back so in terms of when we go and make a decision because it's hypothesis testing then there are a couple of things that you need to be aware of our median will be from position one or population one and population two so we will calculate the median or they will give you the median to calculate that and some of these things remember like I said 1502 is a second module to 1501 so most of the basic things that you needed to learn you lend them in 1501 and then you come and apply them in 1502 so for those who are doing 1501 and 1502 at the same time it means you're learning this concept at the same time which makes things a little bit difficult and different because then some of this concept you lend them later in the in the course and then some of them you lend them in the beginning of the course so you need to just make sure that you adjust how you lend content relating to 1501 and content relating to 1502 so in terms of calculating the test statistic we know that we take the sum the sum of the ranks from the sample size and that is our T1 now if we want to make a decision to either reject the null hypothesis or to accept the alternative or reject or accept the null hypothesis then we need to do a hypothesis testing and the first step of hypothesis testing is to state your null hypothesis and also so that you are able to determine where your original rejection will be when you go find the critical value so in terms of state in your null hypothesis if it's a two-tailed test you will say the median the median for population one is the same as the median for population two alternative will state that they are not the same so you can either use an equal sign m1 equal m2 or in your alternative hypothesis you will state that m1 is not equals to m2 therefore it means there are some differences then you will need to go to the table to go find the critical values now on the hypothesis testing when it is two tests or two sided tests or two tailed tests then it means there are two regions of rejection so you're going to go and find the upper t1 critical value and the lower t1 critical value so there will be two critical values on the table and we'll look at the table so that you can see the values that I am referring to so the the critical value once you have them then they will create two regions of rejection so as you can see here if it's below this t1 lower limit you will reject the null hypothesis if it's above t1 upper limit you will reject the null hypothesis otherwise if it falls between in between those two values you do not reject the null hypothesis for a left tail for a left tail your alternative hypothesis will state that your mean one is less than your mean two and you will only have one region of rejection because you will only use your lower limit tail so sorry because I'm getting switched over to the low shading right and for the right tail test we use the alternative will be your mean median for population one is greater than the median of population two and you will only have one region of rejection the other thing I forgot to mention is please make sure that you complete the register I can see that Jacques has posted it okay so let's look at an example of how we apply all that knowledge that we just learned so a sample data are collected on the capacity rate and this is the percentage of capacity for two factories and we need to test at the median operating rate for the two factories the same so that is the hypothesis that we need to test that at the two medians of those two factories factory A and factory B are they the same and we are given here our factory A the rates 71, 82, 77, 94 and 88 and factory B the rates 85, 82, 92, 97 we need to test for the equality of the population median at 0.05 and because they told us at the equality so it means we are doing a two tail test the first thing that we do is we put the capacity in order so that we are able to rank them so we start putting the factory A and factory B values together and that one will be two and we have 82 and 82 remember with 82 and 82 they are the same so it means we need to take the average and put the average value or average rank for those two so it will be because they are on position three and four we take three plus four and divide by two and that gives us 3.5 so they both going to have a weight or a rank of 3.5 3.5 and go in 85 88 it's on position six 92 on position seven 94 on position eight and 97 on position nine and once we are done with the ranking them then we add all the ranks for factory A all the ranks for factory B once we are done with the ranks then we go and select our test statistics so because now we need to also determine which one will be our N1 remember N1 is from the sample size with the smallest value so factory A has one two three four five values and factory B has one two three four has four values so therefore it means factory B will be our N1 and factory A will be our N2 and then our T1 will be the sum of the ranks from our N1 which is factory B so it will be 24.5 so that will be our test statistic so we know now we have our test statistic of T1 X 24.5 and we have our N1 being four values of the sample size value four and N2 with the sample size value of five and we need to test this at alpha zero comma zero five so now we need to go and do our hypothesis testing but before we go and do the hypothesis testing let's go find the critical value so now our critical value we use our alpha that's the other thing we use our alpha and the N so going to look at where our alpha value because our alpha value is zero comma zero five so the table is broken down into a one tail and a two tail alpha value so this is the one tail zero comma zero two five and a two tail zero comma zero five so because we're doing an equality and therefore it means we're doing a two tail so that is why we go to the two tail alpha value of zero comma zero five and then we're going to look at N1 you will find the value of N1 at the top and the value of your N2 will be at the bottom or on the left so our N1 we said it's four and our N2 is five and where they meet we will have our T level of lower level or lower limit and T upper limit and we're going to use 12 and 28 as our critical value or our limits so that it creates two regions of rejection okay so now let's do the hypothesis testing let me write down the values we had 12 and 28 so now let's do the hypothesis testing the first type of hypothesis testing is to state the null hypothesis and the alternative hypothesis so we know that our null hypothesis was the mean of one the mean of population one or factory one will be the same as the mean of median of the factory two and the alternative will state that they are not the same and then what else are we given we are given the value of alpha we are given the value of N and we found the region of rejection so our lower limit was 12 and our upper limit was 28 so we need to take our T1 T1 we found that it was 24.5 and we need to look at where does it fall does it fall in the rejection area or outside of the rejection area it falls between 12 and 28 because it's 24.5 so it falls in the do not reject area and therefore we do not reject at alpha of 0.05 and we can conclude that there is not enough evidence to prove that the medians are not equal and that's how we do the hypothesis testing let's look at how the questions are asked in the exam or in your assignment so given the test statistic of T1 is equals to 242 so here they have given us the sum rank of our smallest sample size and N1 is equals to 6 and T2 is equals to 36 and N2 is equals to 9 use the Wilkinson ransom test to determine at 5 percent level of significance whether the location of the population one is to the right of the location of population two which one of the following statement is incorrect so one is the question asking us they want us to test the hypothesis that our population one is to the right of population two is to the right of population two so therefore it means it will be greater than so it's a one tail test so they are null hypothesis they are stating that the median of population one is greater than oh it's I can put equal don't have to even put that it's equal to the median of population two and our alternative which is H1 will say the median of population one the median of population one is greater than the median of population three that is the hypothesis that we want to test they also gave us at alpha of 0.05 so therefore it means we're going to have to use the table and only go to alpha a one tail test because this is a one tail test so this will be a one tail test so we'll we'll go to one tail test of 0.05 they also gave us our n1 is 6 and our t1 is 42 and 2 is 9 and t2 is 36 so therefore it means this is our test statistic our test statistic because it comes from the sample size n1 of 6 which is lower than or smaller than six or smaller than nine okay the question is also asking us or the other key thing that they gave you here was to tell you that you need to use a well-concentrated some test but we need to find which statement is incorrect in a way so let's after analyzing what I'm given and understanding what the question is asking me that I must identify the incorrect statement now let's go and answer the question option one it states that the hypothesis the null hypothesis the population location are the same because it's a it is a hypothesis testing it doesn't really matter what we put on the null hypothesis because always there is an equal sign so the null hypothesis would be correct the alternative hypothesis states that the location of population one is to the right but that is what they said so is to the right so they gave us that statement so that is correct number three it says the region of rejection it's between the lower limit tl of the three and tu of modern of modern 63 so we need to go and find the critical value so if you have tables in front of you just gonna open my table stop sharing from that side and share my entire screen so I can show you the table so now this is the tables we're gonna go and look for wilkinson there we go so here we need to look for one tail test 0.05 there we go there is our table that we are looking for I can make it bigger so you can see it so we're looking for alpha of zero comma zero four and we know that our n one is six and our n two is nine so we need to go at the top and look for six and yeah the bottom we look for nine so we're both meet that is where we want to be steady three and 63 okay so that is the three and 63 so the region of rejection is correct our t test or our test statistic it says it is 42 so our t test statistic we said is 42 that is correct now in conclusion the two population locations are not the same and we reject the null hypothesis is that true let's test that because we know in terms of our critical value we have two regions of rejection we have 33 and we have 63 if it falls somewhere in between we do not reject remember do not reject if it falls this side we reject we reject the null hypothesis so let's see our 42 falls where falls in between 33 and 63 so therefore we do not reject so in conclusion the population location are not the same we reject the null hypothesis that will be the incorrect statement and that's how you do the hypothesis testing and you answer the questions when asked any question any comments any query before we move on into the last bit so there are no questions now let's look at Wilkinson ranked sum test for large samples so far we looked at when there is small sample now we're going to look at when there is large samples so when they are large samples sorry my thing is bring all of them all at once for large samples the test statistic t1 is approximately normal with the mean of t1 and the standard deviation of t2 so since we are approximating the values therefore it means we calculate the mean by using your sample size one plus times n plus one plus n plus two plus one divide by two remember your n will be n plus one plus two that will give us your mean of your large samples and your standard deviation is given by the square root of your n1 times your n2 times n plus one divide by two remember n is n plus one n plus n1 plus n2 that's the value of your n we must use the normal approximation if either your n1 or your n2 is greater than 10 as you can see that the value of n should be greater than 10 the previous one the value of n should be less than 10 so if it's greater than 10 then we use this approximation so it means we're going to be using the z table assign your n1 to your smallest the smaller of the two sample size so the same criteria that we used the previously we still continue using that and we can then use the normal approximation for the small sample size so we're going to use in when we calculate our test statistic which will be different through the test statistics from the rank sum here we with the small sample with the large sample we use the z test so it means we're going to find the critical values on the z table our z test is given by our our mean i'm going to call it sample mean which will be our t1 minus our population t1 divided by the standard deviation t1 which is the same as s t1 minus your n1 times n plus 1 divided by 2 everything divided by the square root of n1 times n2 times n plus 1 divided by 2 oh divided by 12 not 2 sorry I forgot for standard deviation we divide by 12 and that will be our test statistic so let's continue and like I said we're going to use the z table to go find the critical value and we will look at that when we get to the exercise now let's look at an example use the setting of the prior example which was the first one that we use the sample size where n is equals to 4 and n2 is equals to 5 and alpha of level of significance of less than of 0 comma 0 5 the test statistic was t1 is equals to 24.4 so now here what it does is because our n1 and our n2 and less than 10 we're going to use the z to approximate those sample sizes to a normal distribution so we can use we can still continue and use the same because we're going to use the normal approximation so our t1 which will be our sample size t t t1 mean for the test statistics we know that we got 24.5 so now we need to calculate the mean for the population so that we are able to approximate it and the standard deviation your n1 we remember n1 was 4 so it's n1 times n which is 4 plus 5 which is equals to 9 plus 1 divided by 2 which gives us 20 and our standard deviation will be given by the square root of n1 which is 4 times 5 and 2 times 9 plus 1 divided by 12 which gives us 4 comma 0 8 2 4 comma 0 8 2 to calculate the test statistic then we just substitute the values that we were given t1 we know that it was 24.5 which we calculated previously which is the sum ranks and our mean population is 20 we just calculated it now and our standard deviation is 4 comma 0 8 2 which we just also calculated which is the square root of that values and the answer we get from calculating all this is 1 comma 1 0 our z test is not greater than the critical value of 1 comma 9 6 because then we need to go and find the critical value at alpha of 0 comma 0 5 to find the critical value let me just go show you how you find the critical value to find the critical value we need to go to the z table usually on your thing you can also use this as your z table um because we're doing a 1 tail then it means this will be our critical value so are we doing a 1 tail or a 2 tail so it seems as if we're doing a 2 tail and since we're doing a 2 tail therefore it means we're going to find the critical value under alpha divided by 2 so let's go back to our so if we have our z of 0 comma 0 5 and we find in the critical value and it is a 2 sided a 2 tail therefore we're going to find the z value on 0 comma 0 2 5 so alpha divided by 2 of 0 comma 0 2 5 and that will be our critical value you can find it that way or you can use your table so if we use the z table i'm going to give you both options of how you find the z values where are my tables now so here is the table so you'll come to the negative side of the table that's where you will find most of this value so we're looking for 0 comma 0 2 5 0 if there are four decimals so let's look for 0 comma 0 2 let's see 0 comma 0 2 5 0 there it is and you can go out it relates to 1 comma 9 ignore the negative and you go up and you take the last digit which is 6 it's 1 comma 9 6 so you can use this method to find the z value or you can use the table at the end to go find the z value so you just need to be very careful when you use this table at the bottom especially when you are giving only one tail test because if it's one tail you will notice that even though this says alpha of alpha of alpha divided by 2 you just need to pay attention that this you can also convert it to the alpha values so for example if it's one tail and it is 0 comma 1 you just go to one tail of 0 comma 1 by using by looking at that to cross point to them and a two tail of 0 comma 1 will be 0 comma 0 5 which will be this so you just need to make sure that you pay attention to that okay so since our critical value is 1 comma 9 6 it is greater than our critical value is greater than that the z test so we do not reject the null hypothesis because the value is below 1 comma 9 6 so therefore there is sufficient there is not sufficient evidence that the means are not equal and that's how you do the hypothesis testing let's look at the example and then we're going to look at the last the last type of the Wilkinson or the non parametric test hmm an independent random sample are selected from two populations the data are shown below are shown in the table below here we have sample one which has all those values and sample two which has all those values so you can see that sample one has fewer values than sample two so therefore it means our n1 here will be sample one so we can automatically from here state that this will be sample one use the Wilkinson ranked sum test to determine whether the data provided sufficient evidence to indicate a left shift in the location of the probability distribution of the sample the test is using alpha of 0 comma 0 5 the test statistic z here is the thing that you also need to pay attention when you look at the question so it says the test statistic z for a one tail test is so it means the first thing that we need to do is to order the data that we have from lowest to highest and then rank them so I'm gonna do that so our sample one and our sample two and I'm just gonna put a line in between so that we can have both of them so let's start with let's we'll do both of them so the first value here is from sample two which has four which is the lowest value so we can start with four I'm gonna crash it out so that I don't get confused and the next one will be five and five so they have two fives five and five what else is there five six seven eight we have two eight so therefore it means we'll have eight here and we will have eight there as well and the next one is nine so both of them are from sample two so there will be two nines and what else do we have nine and ten so we have two tens so I'll start there ten there and ten there so this will be the tens and we have twelfths right ten eleven twelve two twelfths so we have a twelve there and a twelve there what else eighteen on this side what else fifteen and sixteen and sixteen so now we need to put some ranks to it so I'm just gonna do this as well rank one and rank two okay so let's rank them so this will be one two three there are four three one two three four five that will be four point five four point five four point five four point five because we take the average of the two um and then oh sure I need to split them not have them on the same so this okay my bet let's change this around this will be two this will be one because I'm starting on this side so that will be four point five and we continue five this will be six uh there are two so six seven it will be six point five six point five and seven oh there are two so it will be seven and eight will be seven point five seven point five and seven point seven point five and we also have eight and nine so it will be eight no man I'm not counting right I'm not counting right I'm not counting right um so this one is four five six seven seven point five so seven eight so this should be eight not seven so this should be eight point five eight point five and then because this is eight nine ten so this should be 10.5 10.5 because then that will be 11 then 12 then 13 14 let's just count that again how many values are here two four six eight 10 12 14 so there should be 14 of them so one two three four five four point five six seven six point five six seven eight nine eight point five ten eleven ten point five twelve thirteen fourteen there we go so we have all of the values so we just need to add them now uh which one has the smallest it was sample one right which is this side so we just need to add all of this four point five we just add four point five five plus eight point five plus 10.5 plus 12 plus 18 plus 14 equals because we add the ranks 62.5 do you also get that 62.5 so we know we now know that our t1 is 62.5 so we have that but that is not the end because we are going to be calculating the z and we know the formula for z is z is equals to or z-state is equals to your t1 minus your mean t1 divide by the standard deviation of t1 and we know that we can calculate our mean t1 because that it is given by your sample size one times n plus one divide by two so let's calculate that and there are one two three four five six there are six and we know we have already counted them they are 14 so plus 14 because it's six plus one two three four five six seven eight six plus eight is 14 plus one divide by two which will be 15 times six 15 times six is 90 divide by two you get 45 yes you get 45 so now we have our mean two so we can also substitute into the formula we know that we have 62.5 minus 45 now we need to calculate what our sigma t is so our standard deviation t1 is given by the square root of let's see if I still remember the formula it's given by the square root of n1 times n2 times n plus one divide by 12 so let's see the square root of n1 there are six of them times and two there are eight times 14 plus one divide by 12 let me see if I open my I need to fix 60 and the square root you say the square root is 7.7 we need to keep all the decimals we can round off when we're done oh gosh it requires me to put the license okay I should have dealt with it before the session so we need to put all the values so it's seven 7.7 I think there are more than that but I think those should be enough so you're just going to divide by 7.74 5967 I just want to do my stop right now and then just want to put in the license on the mere second I should be done and fix okay I can share my screen again because now I have got my calculator open let me put it this way our thingy it's 62.5 subtract 45 divide by 7.745967 and if I say equal I should get my test statistics so any of those values should be equals to that which is option number one did you also get the same is yeah that will be option number one okay which is 2.2592 right and that's how you answer the question relating to Wilkinson rank sum I'm not gonna ask you to do the next one we're gonna do it we're gonna look at how we answer the question relating to the other and then if we have enough time later on then we can come back to this question 20 which is our exercise two so let's let's now look at Wilkinson sign rank so we already covered the two the two Wilkinson when the sample sizes are small and when the sample sizes are large right now we're going to look at how do we do a sign rank type test similar you need the statistical tables you need the formulas and the calculator now with the sign ranks we're also going to introduce another table so Wilkinson rank sum test is a non-parametric test for two related populations and these are the steps that you're going to follow you know therefore you to be able to answer the questions or if you want to do the hypothesis testing step one for each of the sample items we need to compute the difference because they come from the two samples so it means we need to find the difference between the two values the two measurements it's like doing a test on before and after and then we just need to check the differences between the two when we calculate the difference we and when we go in to calculate the test statistics and all that we're going to ignore the plus and the minus sign to find the absolute value of the differences so that's what we're going to do and we're going to omit the differences where they are equals to zero for any of the sample sizes and we're going to assign rank ti from one to n and if there are ties we're going to do the difference so we're still going to do the same thing as what we have been doing but now we're only going to be working with one column and no two two columns so we're going to just assign the ranks based on that column of the differences then we need to reassign back the sign the plus and the minus so that we can calculate or find the sign rank sum t1 and after that then we can compute the Wilkinson test statistic which will be our t as the sum of the positive ranks so let's look at that example so what we're going to be doing in terms of calculating the test statistic or what we call the Wilkinson sign rank test statistic which is the sum of only the positive ranks so it means we're going to add all the positive ranks together and in order for us to find out whether we're going to reject the null hypothesis or not to reject the null hypothesis we need to find the critical value and finding the critical values we need to go to table seven which is critical values of Wilkinson sign rank test okay so other things that we will need is we need to be able to calculate the expected values for the Wilkinson rank sum sign ranks and not rank sum the signed ranks we need to calculate the expected values which is given by n times n plus one divided by four and here we assume that the sample size needs to be greater than 20 and the weighting is approximately normally distributed so our expected value will be that and our standard deviation which will be the square root of your sample size times n plus one times two n plus one divided by 24 so it's double the size okay so to calculate the test statistic we're going to use the z test and our z test is given by z of your t test which are your positive sum ranks remember those positive summons minus the expected value which is your mean divided by the standard deviation which is standard deviation of the ranks then we need to test for the difference in the pairs the null hypothesis will state that the population difference the population means the same and the alternative false state that they are different okay and for that case we're going to use a critical value we're going to find the critical value on the z test so with the sign ranks there are two ways you can make a decision so based on your rank positive value the sum of the rank positive values you use table seven to find the critical value so that you are able to use that to make a decision for the sign rank when your sample size is greater than 20 which is big which we approximated to normal distribution then we use the z test so it means your critical values will be from the z table and that will use table eight if possible okay so let's look at this example from one of the past exam paper says it will concern sign rank sum in I think this is from the sum rank sum it's not from yes it's from the sum rank sum yes and the test statistic is calculated as t is equals to 91 so they have calculated on ready the difference and they do have the answer that the different the sum of the positive differences is 91 and they say there are 18 observation pairs of which three have zero differences and the the two test is performed at five percent level of significance so now they have calculated the t which is our sum rank in this instance of the positive sum ranks and they have given us our n which is our sample size 18 so they are 18 there were 18 observations so n is 18 and they also tell us that there were three values that has a zero difference so it means we ignore those ones anyway and the test statistic we need to test it will be a two-tailed test and it is performed at alpha of zero comma zero five so we have all the information we require so this will be alpha of zero comma zero five choose the correct option so that is the question so we need to find the correct answer now the first one says the critical cut of value r and we need to pay attention to this it says it's greater than because it's for the upper limit it's 90 and the lower limit is state and we can also do the same with number one and number two let's go find out if that is correct so remember we need to use table seven so let's go to the table let's go to the table we need table seven there's this table seven we're doing a two-tailed and the alpha is zero comma zero five two-tailed two-tailed zero comma zero five we found our table so this is our table what else do we know we also know that our n is 18 so if n is 18 those are the two values so the values are 40 and hundred and 31 so those are the values 40 and 131 so going back let's see which one of the two are correct so we know that it's 40 now we did i write that 40 and 131 so the lower limit lower limit is 40 and the upper limit is 131 so this says upper limit is 90 upper limit is 131 so this one is correct this one is incorrect lower limit is 30 lower limit is 40 this one is incorrect this one is correct so this one is the correct one then therefore the other statements don't even are not even necessary for us to look at but anyway we can also look at that and say are we going to reject the null hypothesis or are we not rejecting the null hypothesis as well looks like because the t one is between 40 and 90 so we do not reject so that unless this question was asking which one is incorrect or something like that but yeah we've got oh no the the next one it says the null hypothesis is rejected no we do not reject the null hypothesis because it falls between so that one will also be not will not be correct and if we are rejecting the null hypothesis no the null hypothesis will be rejected will not be rejected but we do not say that we say the null hypothesis we do not reject the null hypothesis so that won't be right the test results are inconclusive no because we do have all information we need so that won't be correct as well so the only correct answer is number two let's look at the next exercise a pair different experiment with n is equals to 30 pairs and with our t which is our positive t of 359 using the wilkinson wilkinson sign rank sum test to determine whether we can infer at five percent level of significance if the two populations differ which one of the following statement is incorrect so now let's go back to our statement so that I can understand exactly what it's given a pair difference with experiment with n of 30 so they have given us the sample size of 30 yielded a positive sum ranks of 349 so they have given us that positive sum ranks and we know that we're doing a sign ranked sum test and we need to determine it at alpha of zero comma zero five so alpha of zero comma zero five so we need to test whether the two population differ so therefore it means we're going to be doing a two-tailed test because they just said it differs they didn't say it's less than or it's greater than or things like that so it is a two-tailed test a two-tailed test because of the weight differ okay so now we can go and ask answer the question which one of the following statement is incorrect number one it states that the null hypothesis states that the two populations are the same is that correct always remember in the null hypothesis it always says it always even equal or it will say they are the same so that is correct we're looking for the incorrect one the alternative hypothesis now remember the the test days they differ so we just need to check if also it says something like they are not equal or they are different so the alternative hypothesis states that the location of population one is different from the location of population two for the fact that they mention weights like different therefore it means they differ then it is correct number three it says the critical value is one comma nine six so remember we're using the z test for that therefore it means we need to go to the z critical value of alpha over two which is z over zero comma zero five which is the same as z which is the same as z of zero comma zero two five zero and remember you can go to table eight and table eight has all the values that we need which then gives us one comma nine six we did do this but I'm just showing you again if you forgot about it so we do go and look at one comma nine six which is correct so now what are we left with we are left with option four or option five to determine whether either one of them are correct so in terms of option four it says the region the region is in absolute value will be greater than of the test statistic will be greater than zero comma zero five it will be equals to one comma nine six so it says in the positive absolute value the value of your critical value will be one comma nine six yes because it will be positive if it was in the if it was in the lower side if we will have two regions of rejection because we're doing a two tail test so this side will be positive one comma nine six and this side will be negative one comma nine six so if it's in the positive side in the greater than side therefore it will be a positive one comma nine six if it was in the negative side of things it will be if it was less than it will be negative one comma nine six so this statement as well is correct so which leaves us with with one so it says the p value is zero comma zero six two so either we need to go and calculate what the p value is the p value you will calculate it after you have calculated your z test so now let's see are we able to calculate the z test we can calculate the z because then we need to calculate z stat which is given by your t positive minus your expected divide by your standard deviation now you will ask me when we calculate all this or when we get all this we do have some of that information because we do know that t plus is 354 minus how do we get the expected so to get the expected value remember is the formula is n times n plus 1 divide by 4 our n there are 30 times 30 plus 1 divide by 4 and that will be oh i just lost my calculator now do you have an answer so 30 they say 2 times it's a 2 comma 30 times 31 divide by 4 it's 232 comma 5 232 is 232 comma 5 that's what you are telling me 232.5 and now we need to calculate the standard deviation which is given by the square root so how do we do that t is the square root of your n times n plus 1 times 2n plus 1 divide by 24 which is the square root of 30 times 30 plus 1 times 2 times 30 plus 1 divide by 24 so 60 plus 1 is 61 so it's 61 times 31 times 30 which is 256 730 divide by 24 which is 23 63.75 2363.75 and taking the square root of the answer what do you get 48 comma 618 48 comma 61618 now now this is our z value so because it's our z value we need to go to the z table so we need to keep only 2 decimals so it means yeah we can roughly say our answer sorry i am doing something very wrong yeah this is our entire year which is 48.618412 so we need to keep all the values so 48.61 this is where we substitute 618412 right that is the answer for our z our standard deviation so we still need to calculate our z so let's calculate our z our z value our z is 354 minus 232.5 equals divide by 48.61 48.412 equals and the answer we get is 2.4991 4991 we need to keep only two decimals so we can say it's 2.50 so we need to go to the table to the z table and go look for this z value so that we can find the probability the p value and we need to be very careful because it's a two sided test so let's go the so and because it's positive then we're gonna go to the positive side of the table so we need to go to the z table remember our z table with the negative values that I showed you now we need to come here because our z value is 3.50 so it means on the left side we need to look for 2.5 at the top we need to look for 0 comma 00 okay right because we're looking for the last digit at the top so let's go to 2.5 or 2.5 is here and at the top we're looking for come on for 00 which is the first column right so we can go to the bottom again and that is the answer that we have which is 0 comma 9938 now we need to be very careful here sir we need to say 1 minus 0.9938 equals so we need to go back to our question let's go there so in order for us to find the p value so yeah we know that the p value they said it was that so but however since the value here it's positive and we're doing a two tail so in order for us to find the p value we're going to find the p value by saying 1 minus the value we find on the table and we need to multiply that value by 2 because there are two sides because of this two tail test it means there are two sides so we did go and find the value on the table which was 1 minus our value on the table was 0.9938 right because of it's positive in the positive value we need to subtract it from 1 why we need to subtract from 1 if you look at this it says we are getting the values from the greater side whereas we should be getting the value from the smaller portion so we're looking for the smaller portion and the smaller portion is the opposite on this one so it will be that value there which is what we are finding which is equals to 2 times 0 comma 0062 if it was only a one tail test this would have been correct but now because we're doing a two tail test we have to multiply by 2 so therefore we take the answer and multiply that by 2 and the answer of the p value should be 0 comma 0 1 2 so which makes number five the incorrect statement yay and we are almost at the end what we didn't cover was number 20 because there is no other exercise after this except the ones that I am giving it to you guys to do on your own so if the session was two hours we would cover also the rest of the other exercises okay so we do have other exercises that have included in the handout as well so you can go through them and if you need any help you can chat with me on whatsapp and then we can I can help you to answer some of these questions some of these questions so here is the first question and then the second question is more about the sign rank sum as well and the third one is about the sum ranks wilkinson's rank sum test and then the sum sign rank sum test with the pairs matched pairs the before and the after okay so with the last few minutes let's recap on what we have just gone through we have learned three ways of calculating wilkinson's or non-parametric test right with the wilkinson rank sum test you are able to calculate it for two ways or in two ways so when the sample sizes are small then we we're going to find the ranks and then we take the test statistic will be the test statistic of the rank sum with the sample size with the smaller sample size and then we need to go and find the critical value and then use the critical values where we will find them on the you will find the upper and the lower limit in order for you to determine where your original rejections are that is the first one the second one in terms of the wilkinson rank sum test is when you have a bigger or a larger sample size and there we are using approximation and therefore we use the Z test and then you need to know how to calculate the mean and the standard deviation and as well as the Z test statistic and then we also follow the same process as we have done with the previous one the test statistic or the sample test statistic will be your or your sample mean will be the sum rank of the smaller sample size which will be your t1 and then you substitute into your Z test to calculate because you would have calculated your mean and your standard deviation using those formulas and then you can calculate your test statistic also you need to go and find the critical value but the critical value here you find them on the Z table and you can find it in two ways either by using table eight or by going and using the standardized standardized cumulative standardized normal distribution table which has the positive and the negative side you just find the alpha value divided by two or alpha value inside the table and go find the critical values outside which are your Z values and that is wilkinson some wilkinson rank sum test in terms of the signed rank sum test you need to find the difference you need to assign the ranks by discarding the negative and the positive by just assigning the rank and then putting back the negative and the positive and calculating the the sum ranks however while you're doing that anywhere where there are differences of zero you need to omit those ones and then only just calculate the sum ranks and then you need you can also make decision based on two things one is based on the test statistic or the t1 that do the farm which is the sum rank sum of the positive values and you go to table seven table seven and go find the critical values on table seven where you just need your alpha value and the value of your n and then you can make your decision based on that otherwise you can use the approximation to normal distribution by using the Z test which will be your test your sum rank positives minus your expected values divided by the standard deviation and therefore it means you need to know how to calculate those formulas of their expected value and the standard deviation in order for you to calculate the Z value and then to make a decision you need to go and find the critical values on the z table table eight please pay attention when you do your critical values because if it's a two-tail test you divide your alpha by two if it's a one-tail test you do not divide your alpha by two and with that said that concludes today's session if there are any questions or answers on comments the platform is yours thank you and enjoy the rest of your evening i will see you next week unless if they cancel the session i won't know please make sure that you complete the register thank you thank you all right