 In the previous lecture, I have shown that how the signed ranks that is the ranks of absolute values or the modulus of x i's can be used to create a test for the ah non parametric location problems. ah Let us consider further the quantities which are called Walsh averages W ij this is defined as the average of 2 of the observations x i and x j. So, this W ij these are called Walsh averages. Then if I consider t plus which I defined as the Wilcoxon-Simland statistics this is actually the number of positive Walsh averages. Let me call it statement number 1. Let us look at a proof of this say suppose I take n is equal to 1 that means, only one observation is there. So, that means, it is only x 1. So, t plus that will be 1 if x 1 is greater than 0 and it is equal to 0 if x 1 is less than 0. And here only one Walsh average is there only one Walsh average W 11 that is equal to x 1. So, if it is positive. So, then it is t 1 t plus is equal to 1 and it is negative then it is equal to 0. So, 1 is satisfied for n is equal to 1. Suppose this statement 1 is satisfied for n minus 1. Now, we have to add x n there and that means, we have to consider the modulus of x n. So, without our loss of generality we take it to be largest. Suppose it is not largest then we can consider another permutation of that in which it will become the largest. Since, we are assuming that one is satisfied for n minus 1. So, whatever permutation we take in that permutation also it will be true. Therefore, without loss of generality. So, without loss of generality we take modulus of x n to be the largest. So, now let us consider t n plus 1. So, I have put subscript here just to denote that it is based on n observations. So, it is t n minus 1 plus if x n is negative because that will not add to the t plus t plus is the sum of the ranks of the positive one. So, if it is negative then it will not add and if it is plus then since it is I am assuming to be the largest its rank is n. So, that will be added here. Now, what are the new Walsh averages? When I am adding nth observation then the new Walsh averages that will be coming the new Walsh averages obtained after adding x n they are x 1 plus x n by 2 that is W 1 n W 2 n that is equal to x 2 plus x n by 2 and so on. W n minus 1 n that is x n minus 1 plus x n by 2 and W n n that is x n. Now, if x n is less than 0 I have assumed that modulus x n is the largest that means in absolute value it is the largest. So, if this largest absolute value if it is negative then whatever be x 1 x 2 x n minus 1 ultimately they will make it negative. So, W 1 n it will become all of them will become and of course, W n n is negative. So, the number of positive Walsh averages remains the same. On the other hand that is t n minus 1 plus if x n is positive if it is positive and since it is the largest in magnitude therefore, it will make whether it is positive or negative it will make all of them to be positive then W 1 n this is positive and so on W 2 n W n minus 1 n W n n they are all positive. So, the number of positive Walsh averages increases by n that is t n minus 1 plus n. So, thus 1 that is the that is t plus is the number of positive Walsh averages. So, I can call it as a theorem here which I have proved now by using induction by mathematical induction this result is true all the time. Now, based on this let us define the indicator function d i j is equal to 1 if Walsh average is positive it is equal to 0 if Walsh average is negative of course, equal to 0 we do not have to consider because of the assumption of the continuity of the random variables the probability of W i j is equal to 0 will be 0. So, then t plus is the number of positive Walsh averages and that is actually equal to double summation d i j i is equal to 1 2 well i less than or equal to j 1 less than or equal to i less than or equal to j less than or equal to n I can write here. If I consider p as the probability of x i greater than 0 when true median value is theta p 2 is the probability under the true median value theta of x i plus x j being positive. So, I call these values p 1 p 2 then in terms of p 1 and p 2 we can write expectation of t plus that is equal to double summation expectation of d i j 1 less than or equal to i less than or equal to j less than or equal to n that is equal to expectation of d i that means for the ones which are j is equal to i and then those terms for which it is less. So, this is equal to sigma probability of x i greater than 0 under theta and in the second one it is equal to probability of x i plus x j greater than 0 because actually I have defined in terms of directly the value 1 and 0 only for positive and negative therefore, this is simply this. So, it is actually becoming n times p 1 plus n into n minus 1 by 2 p 2. So, under the alternative that means if median is any other value theta then the expectation of t plus will be in terms of this. Similarly, if we look at the variance of this. So, for variance we need the expectation of this square. So, that is expectation of double summation d i j whole square. Now this you can write as expectation of summation d i i plus summation d i j. So, this is i is equal to 1 to n and here i less than j. So, this is equal to now let us expand these terms. So, it is becoming d i i square and here what are the terms that we will be getting see all of the terms will be coming here and then there will be a cross product also. So, we can express it like this the terms for which. So, basically square of all the terms will be coming here because square of this and square of this. So, I can put double summation i less than or equal to j and this I can make i j here. Now cross product terms will be of two types. One is from here and one is from here. In this one you can consider i i and j j kind of terms here. Here you can see that the terms will be i j and may be well first one may be same. So, i k then there can be term in which second is the common and then then we can be one which is all are same all are different. So, we can put it like this i less than or equal to j less than k d i j d i k and plus we can write r less than sorry i less than k less than or equal to j d i j d k j and then we can also write two times i less than j k less than l i less than k d i j d k l. So, these many terms will be coming when I square it this. So, now if I look at expectation. So, expectation here and in each of them it can be applied let us look at this. So, this expectation is equal to n times expectation of say d i i square plus n into n minus 1 by 2 expectation of d i j square plus here of course, i is less than j plus twice n into n minus 1 by 2 expectation of d i i d i k where i is less than k of course, all of them are under the assumption that the median is theta plus twice n into n minus 1 into n minus 2 3 into 2 expectation of d i j d k j this is for i less than j less than k twice n into n minus 1 into n minus 2 into n minus 3 by 4 into 3 into t 2 expectation of d i j d k l here i is less than j k is less than l and i is less than k. So, if you combine all these terms here now some additional probabilities will be coming earlier I defined p 1 and p 2 but now because this one will involve some joint probabilities let me write it here. If I define say p 3 is equal to probability of say x i plus x j greater than 0 as well as x i greater than 0. Similarly, if I define p 4 is equal to probability of x 1 plus x i plus x j greater than 0 x i plus x k greater than 0. So, this term is then becoming n times p 1 plus n into n minus 1 by 2 p 2 plus n into n minus 1 p 3 n into n minus 1 into n minus 2 by 3 p 4 plus n into n minus 1 into n minus 2 into n minus 3 by 12 p 2 square. So, once we have the expectation of t plus square expectation t plus is already there. So, we have the expression for the variance of t plus also. So, that is basically this term minus expectation theta t plus whole square. So, I am not writing the whole expression here it is simply the repetition. So, in case the median value is some theta then also we have been able to derive the probability in the moments etcetera of the t plus here. If you look at the nature of this statistic that we have used here it is defined as the ranks of the positive ones and then u x i. Now, in play. So, these are some functions of x i's or some functions of modulus x i. So, we can consider various choices here and with various choices one can consider the general scoring function and we call it a linear rank statistic. So, general linear rank statistic for a general scoring function. So, in general we consider a 1 less than or equal to a 2 less than or equal to some a n and we define s is equal to sigma of a i u of x i i is equal to 1 to n. So, as examples you can see if we are considering a i is equal to 1 then it is sin rank sin test. So, these are called sin scores. Second example is if a i is equal to i then s is Wilcoxon where. So, these are called then Wilcoxon scores. So, this actually this a i's are called score function. There are some others also like for example, you may choose a i is equal to in terms of the cumulative distribution function of standard normal that is half plus i divided by twice n plus 1. These are called normal scores and then there is another one called Fraser's normal scores where this z i is actually the ith order statistic among modulus of z 1 modulus z 2 modulus z n where z i's are i i d normal 0 1. So, if I consider a random sample from a standard normal distribution and I consider the absolute values then the relative position of modulus z i that is ith order statistics. Then based on that if I define this then it is called Fraser normal scores. Since the null distribution of u x i is known therefore, we can look at the mean variance etcetera of s in the general sense here. So, if I look at say expectation of s that is equal to sigma of a i that is half. So, it is sigma of a i i is equal to 1 to n. So, in all the cases when these are only permutation of numbers 1 to n then this will become simply n into n plus 1 by 2. For example, in the Wilcoxon score it was like this for sin this was 1. So, it was n by 2. So, like that there can be various choices here. If I look at variance of this then it is equal to sigma a square i variance of u x i i is equal to 1 to n that is equal to. So, half minus half square that is 1 by 4. So, it is 1 by 4 sigma of a square i i is equal to 1 to n. So, this s that is equal to sigma a of r i plus u of x i can be written as sigma a j u x i j. Based on this if I consider the expectation of s etcetera then what I will get u of x i. We have already proved that the distributions of r i plus and x i are independent. So, this becomes expectation of a r i plus into expectation of u x i, but this is half here. So, it is simply becoming half times expectation of a r i plus, but if I summing over all of them then it is simply all the values are coming here that is all and variance of s will be simply 1 by 4 sigma a square i i is equal to 1 to n. So, in general we can prove the following theorem that the distribution of s is symmetric under h naught. So, for a proof let us look at what is the probability that s is equal to minus some s. So, we will prove actually it is symmetric about its mean. So, it is probability of sigma a j u of x i j expectation as we have already calculated. So, it is simply minus s here. Since the distribution of u x i j and 1 minus u x i j is the same because what is u x i j? u x i j takes value 1 with probability half and 0 with probability half when the null hypothesis is true that is when the median is assumed to be 0. If that is so, then if I look at 1 minus u x i j that is also having the same distribution because that is also taking value 0 and 1 only each with probability half. So, in this statement I can replace 1 minus u x i j that is half sigma a i minus s. Now, this term you take to the other side. So, you are getting it is equal to p naught sigma a j u of x i j j is equal to 1 to n is equal to half times sigma a i plus s that is probability of s is equal to expectation s minus plus s. So, the distribution of s is it is symmetric about expectation of if it is not under h naught then the distribution will not be symmetric because then the distribution of u x i j and 1 minus u x i j will not be the same. Here since it is under the null hypothesis so, both the probability of x i j being positive or negative is half. Now, this part you can see this is a general theory because I am not assuming a particular form of s here general I am writing general scores a i is there. We have seen that the asymptotic distribution of the Wilcoxon sign rank statistic the asymptotic distribution of the scientist statistic they are all normal asymptotically. There we were able to do the exact calculation. So, here it is in the terms of a i's if we impose certain condition on this is scores that a i's etcetera then here also we can obtain the asymptotic distribution to be the normal. So, we impose some condition these are called no other's conditions named after a me no other. So, let us define this no other's condition what is the no other's condition the no other's condition is maximum of a j square for 1 less than or equal to j less than or equal to n divided by sigma a square j j is equal to 1 to n this goes to 0 as n tends to infinity. So, this is called no other's condition then we have the following result that is under no other's conditions the distribution of s minus expectation s under h naught divided by square root variance of s this converges to standard normal as n tends to infinity. Now you see here the expression for the general linear rank statistic I have written in terms of summation here. So, if we use this Lyapunov's center limit theorem we can do that thing. Let us apply Lyapunov's center limit theorem what are the expressions here the value of once again let us go back to this expression here let us call this expression as some w i then what is the value of w i it is either plus a i with probability half and it is half it is 0 with probability half. So, let us write that let us write say w i is equal to a i u x i we can also write it as a j u of x i j as we have done in the previous one it does not matter. So, w i is equal to a i under the null distribution it is equal to half and probability that w i is equal to 0 that is also half. So, if I look at the expectation of w i that is a i half and therefore, if I look at the I call it mu i then it is equal to half sigma half a i now let us look at the second one sigma square. So, that will be equal to half half a square i minus 1 by 4 a square i that is equal to 1 by 4 a square i. So, sigma square then that is becoming equal to 1 by 4 sigma half a square i i is equal to 1 to n we also need the third central moment here. So, the third central moment here will become equal to w i minus a i by 2 cube. So, when it is equal to a i it is simply becoming a i cube by 8 and then you are dividing by 2. So, it is becoming by 16 then when w i is 0 then it is becoming again a i cube by 8 then half. So, a i cube by 16 that is equal to a i cube by 8. So, rho cube is equal to sigma half a cube i 1 by 8 i is equal to 1 to n. Of course, there is one comment here I did not mention about a i's what are the values of a i's a i's are either 1's or 0's in this for example, in the case of in the sign rank it is 1 to n otherwise it is i etcetera. So, in general actually we are taking a i's are positive. So, this term I did not mention earlier what it is required otherwise you have to again put modulus here. So, we have to consider rho by sigma it is more convenient if I take the power 6 here. So, it will become that is 2 to the power 3. So, it is becoming 2 to the power 18 and here you are having 2 square. So, that will become 2 to the power 12. So, some coefficient times. So, some constant times you will get sigma of a cube by whole square divided by sigma a square i whole cube. Now this term we separate out this is less than or equal to in one of them I put maximum here. So, this is less than or equal to maximum of a square j for 1 less than or equal to j less than or equal to n. So, basically what I am doing I am splitting it and in the second term I am writing it as simply sigma a square j whole square and in the denominator I am having sigma a square j whole cube. So, this term gets cancelled out. So, by no other condition this goes to 0 as n tends to infinity. So, s minus expectation s divided by square root of variance of s that is actually s minus expectation s we have already calculated here that is half sigma a i divided by square root 1 by 4 sigma of a square i. This goes to 0 this goes to z as n tends to infinity. So, the asymptotic distribution of the general linear rank statistics are satisfied and actually for the sign test statistic for the Wilcoxon sign rank statistic it is already satisfied. I wrote two more scores that is the normal scores and the Fraser normal scores for that also one can actually check that this will be satisfied. Now, the procedure that I have developed for single sample problem in some cases they can be also extended to two sample problems. For example, if we consider bivariate and we still want to compare the locations of both of them then we can take the differences. Now, based on the differences if you look at the distribution of that and we define this ranks of that differences then this test statistic will again work here. So, let us consider these extensions to various other cases we can also look at some confidence in double procedures etcetera ok. Let x 1 y 1 x 2 y 2 x n y n be a sample from a bivariate population and we want to we want to test say equality of medians theta x and theta y of x and y sample populations ok that is the separate population. That means, my hypothesis testing problem is something like this theta x minus theta y is equal to 0 greater than 0 less than 0 not equal to 0 etcetera. So, here we can consider say d i that is equal to x i minus y i. Now, based on d i we can consider say we will coxson test or sin test etcetera. Another application is to look at the confidence interval we can also construct confidence intervals. Suppose we are considering some other number say u 1 u 2 u n star suppose they are observations and they need not be independent also need not be independent. We can consider say t 1 is equal to number of u's which are greater than say 0, t 2 is the number of u's which are less than 0 then we can take minimum of t 1 t 2. So, based on this we can consider let us call it say t star we can reject h naught if. So, hypothesis is say theta is equal to 0 against theta naught equal to 0. We can reject this if t star is say less than or equal to some c where this should be equal to alpha. Now, this can be determined as we are saying minimum of t 1 t 2 greater than c is equal to 1 minus alpha or we can say t 1 is greater than c t 2 is greater than c is equal to 1 minus alpha and this you can consider as say u of d less than theta naught well less than 0 less than u n star minus d plus 1 is equal to 1 minus alpha. Now, in place of 0 we replace by theta then this is becoming u d less than theta less than u n star minus d plus 1 is equal to 1 minus alpha. So, we are getting that u d to u n star minus d plus 1 this is 100 1 minus alpha percent confidence interval for theta. We can also consider point estimation problem. So, let us consider say z 1 z 2 z n be a random sample from a location parameter distribution f x minus theta and symmetric about theta and let us consider say h of z 1 z 2 z n a test statistic for h naught theta is equal to 0 against say h 1 theta greater than 0 and suppose we reject for large values of h z then this you look at this h satisfies the following conditions. One is that h of z 1 plus a and so on z n plus a is non decreasing in a for each of z 1 z 2 z n. Second is that h of z 1 z 2 z n is has a symmetric distribution about mu under h naught. If we consider say theta star is equal to supremum of those values for which h of z 1 minus theta and so on z n minus theta is greater than mu and theta double star say that is equal to infimum of those values theta for which h of z 1 minus theta and so on z n minus theta is less than mu. If I consider say theta head is equal to theta star plus theta double star by 2 for scientist for example, here h of z 1 z 2 z n is the number of z i's which are positive then h is symmetric about n by 2. So, here theta star will then become equal to supremum of theta sigma of u x i minus theta greater than n by 2 i is equal to 1 to n that will give me theta star is equal to x n plus 1 by 2. If n is odd and similarly theta double star that will become equal to x n plus 1 by 2. So, what we are getting that theta head is equal to x n plus 1 by 2 as the estimator of the median theta. So, basically we are getting a sample median as the estimator of the population median. If n is even then if you calculate these quantities theta lower upper star that will become equal to x n by 2 and theta double star this will become equal to x n by 2 plus 1. So, theta head is then again sample median thus we obtain the sample median as an estimator for the population median. So, this linear rank statistic or this core function that is actually useful in deriving confidence interval for deriving the point estimates it can help us in testing about the equality of the median in a bivariate problem also. So, these are various applications of general linear rank statistics. Then the other problems that come is comparing the medians of two independent samples. So, in that case this directly cannot be used. So, let us consider separately the two sample problems. So, our next topic in this nonparametric method says two sample problems or two sample location problems. So, let us consider say in fact, I have already given the form of U i's and U bracket i's etcetera. So, we will see how these terms are used. So, x 1, x 2, x m be a random sample from a continuous population with cdf say fx and y 1, y 2, y n be an independent random sample from a continuous population with cdf gyy. Suppose this x is so we are already assuming that they are independent and in general we want to test say fx is equal to gx for all x against not equal for some x. So, this is a general problem of equality of the two distributions, but in particular we can consider location equality problems in equality problems, scale equality in equality problems and general alternatives. Let us consider say location problems here. In the location problem we consider h naught as that fx is equal to gx for all x and in h 1 gy is a location shift of f for all x for some theta. So, this is interesting here. I have actually if you consider the standard problems in the normal distribution etcetera, then these problems are directly related to the equality of the location that is the mean etcetera. Here it will become in terms of median you can say. So, we can consider if theta is greater than 0 then we are basically saying it is equivalent to saying that median of f is smaller than median of g. If theta is less than 0 then median of f is larger than the median of g. So, the hypothesis testing problem is this is equivalent to testing h naught theta is equal to 0 against h 1 either theta greater than 0 or theta less than 0 or theta naught equal to 0. These are the alternatives actually. So, as you can see that it has come down to the original type of problem here. Now, we will introduce a test-test statistic it is called one Whitney Wilcoxon test. When the null hypothesis is true then basically we are saying the two distributions are same and then this x 1, x 2, x m and y 1, y 2, y n this can be actually considered as one sample and if it is one sample then all the arrangements of this x 1, x 2, x m, y 1, y 2, y n among the numbers 1, 2, m plus n they will be equally likely. So, if I consider the probability of any arrangement of m excess among m plus n observations then it will be 1 by m plus n c m or 1 by m plus n c n. So, we utilize this concept here basically we consider counting of how many y j's are less than x i's how many y j's are greater than x i's etcetera actually that will directly give us a hint of this testing problem. You can see that in the case of parametric inference we look at the means of the observations. Since here if I look at the means of the observations etcetera the distribution will be extremely complicated because we do not know actually what is the form of f and what is the form of g. Therefore, we have to do or we can we have to actually work with the numbers only that is the ranks or the how many of them are positive negative, how many of them are greater than the other one because the probabilities can be calculated in terms of capital f and capital g, but we cannot calculate expectations and other quantities if we do not have the basically we cannot find out the distribution of the sums of the observations or the means of the observations. So, that is the difference between actually the methods of the parametric inference and the non parametric inference. In the parametric inference we directly go down to the sufficient statistics we check whether it is complete or not and then we base our inferences on that. In the case of non parametric that is not possible and therefore, we work with the order statistics we work with the signs of the things we work with the ranks. So, under H naught we observe that under H naught any arrangement of m x's and n y's are equally likely. So, probability of any arrangement is 1 by m plus n c m which is also equal to 1 by m plus c. Let us consider say define d i j that is equal to 1 if y j is less than x i and it is equal to 0 if y j is greater than x i. So, this is defined for all i is equal to 1 to m and j is equal to 1 to n and then we define u is equal to double summation d i j for i is equal to 1 to m and j is equal to 1 to n. This u it is actually known as Munn-Whitney Wilcoxson u statistic it was given in 1947. Now what are the possibilities here it could happen that all the y j's are greater than all x i's in that case value of u will be 0. If you have all the y j's less than all of x i's then all the d i j's will be 1 and therefore this value will become equal to m n. So, the values of u this will vary from 0 to m n and therefore it will test the departure from the equality of the medium. So, for example if more of the y j's are less than x i's then that means d i j is higher value that means the median of y's is less than the median of x's and that is equivalent to saying that median of g is smaller than the median of this one then it is actually equivalent to theta less than 0. Similarly if this is smaller then we are getting theta less than 0 that means if d i u is smaller then more of y j's are larger than the x i's that means median of y's may be tending to become higher than the median of this. If that is so then you will get theta greater than 0. So, this hypothesis will be true and similarly for either very large or very small you will have theta not equal to 0. So, all the 3 cases will be actually satisfied here. So, this statistic u tests the departure from theta is equal to 0. If u is large then median of y's will be larger than median of g will be larger than median of f that is theta is less than 0. If u is small then median of g is g is smaller than median of f that is theta is greater than 0. So, we can consider the following test that is reject H naught at level alpha if u is less than or equal to c alpha this is for alternative H 1. u is greater than or equal to c 1 minus alpha this is for alternative H 2 and for the third one for u less than or equal to c alpha by 2 or u greater than or equal to c 1 minus alpha by 2 this is for alternative H 3 and this c beta is the largest u such that probability of u less than or equal to u is less than or equal to beta or c 1 minus beta is the smallest u such that probability of u less than or equal to u is greater than 1 minus beta I think they should be greater than or equal to. In the next lecture we will discuss the null distribution of u's how it is obtained the mean and variance under the general hypothesis. There is a related one which is called Wilcoxon a statistic for the two we will define the general rank statistic for the two sample problem we will look at the asymptotic distributions of that. So, these are various things that I will be taking up in the next lecture.