 We were discussing the theory of general linear rank statistics in the previous class. We have discussed the distribution, then we also talked about the how to find out the asymptotic distribution of the general linear rank statistics. Now, using that theory, we will derive the asymptotic distribution of Mann-Whitney u-statistic and the Wilcoxon rank sum statistics for two sample problems. We will show that they are actually asymptotic normal. So, this is proved in the following theorem. So, we consider that m and n tend to infinity, m and n tend to infinity such that m by n goes to lambda. That means, the where capital N is equal to m plus n. That means, basically what we are saying is that it is not that abnormally one of the sample sizes becomes very large. It will be that both of them will have a fixed ratio. So, here of course, your lambda will be between 0 and 1. Then the standardized Wilcoxon statistic and similarly the standardized Mann-Whitney statistic that is the W minus expectation W by square root variance W and U minus expectation U divided by square root variance of U. This two have limiting normal 0 1 distributions under the null hypothesis that is theta is equal to 0. To prove this, let us consider here the definitions of this. Let me rewrite this thing. Suppose, I consider Tij that is equal to 1 if yj is greater than xi and it is equal to 0 if yj is less than or equal to xi. So, if I consider U star that is equal to double summation Tij, then actually it is equal to mn minus U because U was defined as the sum of sigma Dij's where Dij was 1 when yj is less than xi that was it was reverse of this. We are assuming that the ties are not occurring. Then we will have expectation of Tij is equal to 1 under the null hypothesis that is when theta is equal to 0. Let us consider say W star which is based on Tij's. If we define this thing, then we will have expectation of W also equal to 0 under H naught that is when theta is equal to 0. Now let us consider the conditional expectation of Tij minus half given that xk is equal to x under the null hypothesis. Then it is nothing but the probability of yj greater than xi given that xk is equal to x minus half. So, we can then write expectation of Tij minus half given xk is equal to x. This is equal to 0 if k is not equal to i and it is equal to probability of y greater than x minus half if k is equal to i. Similarly, if I consider expectation of Tij minus half given yk is equal to y then this is equal to 0 if k is not equal to j and it is equal to probability of y greater than x minus half if k is equal to j. Now we are assuming that under the null hypothesis that is theta is equal to 0, x and y will have the same distribution, x and y have the same distribution under H naught that is f. So, if we consider expectation of double summation Tij minus half given xk is equal to x then that will become simply equal to now you see here that I will get this value when k is equal to i for all other values it will be equal to 0. So, how many times that will occur when k is equal to i that is now xk how many xis are there there are n. So, this will become n times 1 minus fx minus half. So, this I can write like this. Similarly, if I consider expectation of double summation Tij minus half given yk is equal to y then this is equal to m times fy minus half you can see here this is x less than y. So, that is fy and here it is y greater than x. So, it is the cdf of y that is f. So, it is 1 minus probability y less than or equal to x. So, that is 1 minus fx here. Now the projection of W star we are writing it as vp is equal to n times sigma half. So, this is i is equal to 1 to m minus f of xi plus m times sigma j is equal to 1 to n f of yj minus half. So, let us consider here n square root n divided by mn vp that is equal to square root n divided by m into sigma vi i is equal to 1 to m plus square root n divided by n sigma vi star i is equal to 1 to n. What are this vi and vi star? We are considering vi and vi star they are uniformly distributed on the interval minus half to half. So, they will have basically 0 mean and variance will become 1 by 12. So, we can now these are in the form of the summation. So, we can apply the central limit theorem on the two terms on the right hand side. So, basically what I have done I have adjusted this terms here. See this particular term because of the probability integral transform this becomes uniform distribution f of on the interval 0 to 1 f of yj becomes uniform distribution on the interval 0 to 1. So, half minus f xi becomes uniform distribution on the interval minus half to half. Similarly, f of yj minus half becomes uniform distribution on the interval minus half to half. So, both of these are summations now and we apply the central limit theorem. So, applying central limit theorem on the two terms of the right hand side what will happen? We will get square root n by m sigma vi i is equal to 1 to m this will converge to 1 by root lambda z 1 where z 1 is following normal 0 1 by 12. And in a similar way if we consider v n star converging to z 2 and v n and v n star are independent then the characteristic function of v n plus v n star converges to characteristic function of z 1 plus z 2. So, basically what we get then here? So, root n divided by m n v p this will converge to that means I am considering the sum here after adjustment here. So, this is converging to 1 by root lambda z 1 and this is converging to 1 by square root 1 minus lambda z 2. So, this is coming to 1 by root lambda z 1 plus this means convergence and distribution 1 by root 1 minus lambda z 2 where z 1 and z 2 are independent normal 0 1 by 12. So, if we apply the linearity property of the normal distribution we get that root n by m n v p converges in distribution to z which follows normal 0 1 by 12 lambda into 1 minus lambda. We can also talk about the asymptotic variance. So, variance of square root n by m n v p this will converge to 1 by 12 lambda into 1 minus lambda and variance of square root n by m n w star that will also converge to that is equal to n into n plus 1 by 12 m n that is also converging to same value because this was just a linear combination of this thing. So, if we now use the projection theorem which I gave in the last class let me just repeat it here that v minus w expectation we expectation v minus w square is minimized by choosing p i star x as expectation of v given x i and this is the projection and expectation of v minus v p square is equal to variance of v minus variance of v p. So, we use this result here now. So, by projection theorem and the relation to which I just now showed you we get that expectation of root n by m n w star minus root n by m n v p this goes to 0. So, if we use the theorem which I gave for the limit part that is if w n as asymptotic distribution and expectation of u n minus w n square goes to 0 then u n also has a asymptotic normal distribution. So, if we use this hence square root n by m n w star has the same limiting distribution as root n by m n v p. So, now if we write u star minus expectation u star by square root variance of u star and apply Schlesky's theorem along with limiting normality of square root n by m n w star we get the result. So, thus we have obtained the asymptotic distribution of the Wilcoxon rank some statistic and the Manwit-New statistics and both are shown to be asymptotically normal. Now, in the general linear rank statistics we are writing the statistics of the form sigma of C i A of r i. Now, in this one we may consider some sort of permutation of the ranks or you can say permutation of the indices then what happens to the distribution our next result is regarding the distribution of the permuted form of this. So, if C 1 prime C 2 prime C n prime is a fixed permutation of C 1 C 2 C n and A prime 1 and so on A prime n this is a fixed permutation of A 1 A 2 A n then S is equal to sigma C i A of r i i is equal to 1 to n is having the same distribution as S prime that is equal to where the vector of ranks has uniform distribution over the set of all permutations of the numbers 1 to n. So, this is an interesting result and it allows us to use the ranks in a that means basically the way the data has been obtained it will not matter when we consider the distribution of the linear rank statistics which is based on that. Let me give the proof of this year since C 1 prime C 2 prime etcetera there this is a permutation. So, we can write C i prime as some C of alpha i for some alpha is equal to alpha 1 alpha to alpha n belonging to r. So, basically this means that it is the ith value of C under permutation alpha. Similarly, we can consider A prime i as equal to A of beta i this is for some permutation beta of the numbers 1 to n. Let us define say a function from r to r as phi r is equal to beta composition with r alpha inverse here this be alpha and beta are this. So, this is actually the composition of the permutations. So, you can look at it like this that r is a vector in r that means it is a permutation of the numbers 1 to n on that we apply beta from the left hand and alpha inverse from the right hand. So, here alpha and beta are fixed as we have mentioned here that these are fixed permutation. So, for fixed permutations this result is being proved here. So, we have already fixed alpha and beta here. So, now let us consider take any r belonging to r which is arbitrarily fixed. So, S prime that is based on i is equal to 1 to n that is equal to sigma C alpha i A beta r i i is equal to 1 to n. Why this is so? Because A prime i is equal to A of beta i. So, if I am writing r i here then this will become beta r i here. So, this I can now write as sigma of C i A of beta r and since I have changed here alpha i to i that means I have taken the inverse transformation for i then this will become alpha i inverse i is equal to 1 to n. So, this has then become equal to sigma C i A of phi i i is equal to 1 to n. Hence S prime is equal to sigma C i A of phi i i is equal to 1 to n is equal to S with r replaced by phi r. Hence S will have the same distribution as S prime. So, this we denote by this S and S prime have the same distribution. Let us repeat the argument here. I am expressing S prime which is sigma C i prime A prime r i as here C i prime has become C i again and here A prime r i becomes A of phi i here. So, you can say that it is a phi is a 1 to 1 function because what is happening there is r, r is transformed using beta and alpha inverse. So, for a given r phi r is uniquely defined. If that is so, then basically the original combination will be preserved here for the distribution that means whatever probability we are saying for that particular thing it will remain the same. As a corollary if we consider like we are going 1 to n and then we take from the reverse side. So, if we consider the permutations which are counted from the left hand side if we count from the right hand side then the distributions must be the same. So, as a corollary we have the following result that this the sigma C i A of r i this will have the same distribution as S prime that is equal to sigma i is equal to 1 to n C i A of n minus r i plus 1. As a consequence we can prove another important theorem. Let r have uniform distribution over r that means we are considering each permutation is equally likely. If either A of i plus A of n minus i plus 1 this is equal to a constant say k or C i plus C of n minus i plus 1 is a constant. Then S is equal to sigma C i A of r i has a symmetric distribution about n A bar C bar. We will take both the cases that is firstly when A i plus A n minus i plus 1 is a constant and secondly the case when C i plus C n minus i plus 1 is a constant. So, A i plus A n minus i plus 1 that is a constant that is equal to k this implies sigma A of i plus A of n minus i plus 1 that is equal to n k this implies 2 A bar is equal to k or 2 A bar is equal to k by 2. So, A of i plus A n minus i plus 1 that is equal to twice A bar. Let me call this relation number 1. So, S is equal to sigma C i A of r i this is having the same distribution as S prime that is equal to sigma A i C i A of n minus r i plus 1. So, as a consequence let us consider the probability of S is equal to n A bar C bar plus S that is probability of S prime is equal to n A bar C bar plus S that is probability of sigma C i A of n minus r i plus 1 this is equal to n A bar C bar plus S that is A bar C bar plus S. This we can write as probability of sigma C i and here we change A n minus r i plus 1 as 2 A bar minus A of r i using this relation here because A i plus A n minus i plus 1 is equal to 2 A bar. This is equal to n A bar C bar plus A n minus i plus 1 is equal to n A bar C bar plus S. So, that is equal to probability of sigma C i A of r i and this becomes n bar C n C bar. So, twice n A bar C bar this you bring to the left hand side. So, and you take this term to the right hand side. So, you get n A bar C bar minus S which is same as saying probability of S is equal to n A bar C bar minus S. So, this will prove that the distribution of S is symmetric about n A bar C bar. So, we have proved this theorem for the case when A i plus A n minus i plus 1 is a constant. Now, let us take the second case. Second case is C i plus C n minus i plus 1 is equal to k which implies that C i plus C n minus i plus 1 is equal to 2 C bar. This proof will be same because we can sum over all the values. So, I will get 2 k and then since both the sums will be the same therefore, this is equal to 2 C bar therefore, that is equal to. So, k is equal to 2 C bar. So, now, let us consider S is equal to sigma C i A of r i i is equal to 1 to n. Now, this we write as C of d i A of i i is equal to 1 to n where d i is the n t rank of. So, basically what we are doing is that if observation, if the ith observation has rank r i. So, i will have the reverse that is d i. That means, I have changed r i by i. So, what is the corresponding reverse value here? So, that is called C d i. So, this will then have the same distribution as sigma C of i of n minus d i plus 1 A of i i is equal to 1 to n. So, if I consider the probability of S is equal to n A bar C bar plus S then it is equal to probability of sigma C n minus d i plus 1 A of i that is equal to n A bar C bar plus S that is equal to probability of 2 C bar minus C d i A of i i is equal to 1 to n that is equal to n A bar C bar plus S that is equal to probability of sigma C d i A i again this term you take to the other side. So, this becomes n A bar C bar minus S that is same as same sigma C i A of r i is equal to n A bar C bar minus S. So, once again you are proving that the distribution of S is symmetric about n A bar C bar. We can actually apply this result to various statistics. So, and therefore, they can be used for the testing problems in the two sample testing problems. Let me give some examples. One is the called Wander Walden statistic. Here the score is are taken as based on the CDF of the standard normal distribution that is phi inverse i by n plus 1. So, if we consider the statistic as sigma C i phi inverse i r i divided by n plus 1 i is equal to 1 to n. So, this will be equal to sigma C i because for m plus 1 up to n this will be 0. So, this is i is equal to 1 to m C i phi inverse r i divided by n plus 1. So, expectation of X under the null hypothesis is equal to m A bar actually you can determine A bar here. See here if we consider the property of the standard normal CDF here that is to determine A bar we use the fact that A of i plus A of n minus i plus 1 that is constant. See if I write say phi inverse i by n plus 1 is equal to say some X then this will mean that i by n plus 1 is equal to phi of X this will mean that 1 minus i by n plus 1 that is equal to 1 minus phi of X that is equal to phi of minus X. So, you will get minus X is equal to phi inverse of n minus i plus 1 by n plus 1. So, what do you get then? Phi inverse i by n plus 1 plus phi inverse n minus i plus 1 by n plus 1 that is equal to 0 that means this constant is actually becoming equal to 0. This means that your A bar is 0 and therefore, you will have expectation of X under the null hypothesis that is also 0 that is m A bar. We can also write the expression for the variance of X that is m n by n into n plus 1 sigma i is equal to 1 to n phi inverse i by n plus 1. So, this kind of statistics are quite useful for the two sample testing problems. Let me also introduce the scale problem here the two sample scale problem. So, we have a random sample let X 1, X 2, X m be a random sample from the CDF FX and Y 1, Y 2, Y n be another random sample from the CDF FX and Y 1, Y 2, Y n be another independent random sample this is from G Y X. So, our null hypothesis is whether the two distributions are identical and alternative is that this is theta X for all X where theta is not equal to 1. So, this is basically the scale model because I have introduced a scale parameter here. So, when you have theta is equal to 1 then the two will be same. So, that is the null hypothesis. We may also consider it in terms of the variability. So, if we consider say X square d G Y that is equal to X square d FX theta X that is equal to 1 by theta square Y square d FX Y. So, that is equal to V X by theta square. So, theta greater than 1 will imply that V X by theta X is greater than V Y and theta less than 1 will imply that V X is less than V Y. So, in some sense then we can say that this testing problem is equivalent to testing which distribution has more variability that is the distribution of X or the distribution of Y. So, basically we can consider this null hypothesis as and the alternatives will be alternatives will be say theta is less than 1 that means whether the variability of X's is less than the variability of the distribution of Y or theta greater than 1 that means whether the variability of X's is more than the variability of Y's or simply say that the variability of X is different from the variability of Y. So, all the three alternatives can be considered here. So, some of the two sample statistics that are introduced for the scale problem they are as follows. Let me give few of them certain two sample statistics for a scale problems. So, here I am taking C i is equal to 1 for i is equal to 1 to n and it is equal to 0 for i is equal to m plus 1 to n. So, when we are mixing the two samples the first one I am assigning the value 1 and in the second one I am assigning the value 0. So, one is mode test statistic. In mode test statistic we take the scores as i minus n plus 1 by 2 square you can easily understand what does it represent here. It will represent that how much difference each value basically when we put a r i. So, n plus 1 by 2 is the mean rank here. So, how much each rank is different from this one. So, this is a measure of variability of the ranks and therefore, if I consider the statistic based on that for example, if I write the mode statistic as sigma r i minus n plus 1 by 2 square i is equal to 1 to m. So, basically this is a rank of x i's in the sample. So, if the ranks are closer to the mean value of all the that means the sample is well mixed up that means x's and y's are well mixed up and therefore, it will mean that theta is closer to 1. Whereas, the more variability will imply that m is large. So, we can consider here expectation of that is equal to m n square minus 1 by 12 n a bar will be equal to n a bar. So, n into n square minus 1 by 12 and variance of m is equal to m n by n into n. I am not giving the derivations here, but this can be done in a easy way. m is small is equivalent to less variability of x's that is theta is less than 1 and m n is equal to 1. So, m large will imply that more variability that is theta is greater than 1. So, this can be used for testing this test of hypothesis here. We can also consider another statistic which is named by several authors actually friend Ansari, Bradley, David Barton. So, if you look at this mood statistic here it is taking the square deviations here. So, if from the square deviation we take the absolute deviations then we get this statistic. So, these are all very natural choices for the score functions here. So, Ansari, Bradley is given by sigma r i minus n plus 1 by 2 i is equal to 1 to m. Actually, there are several variations of this that means whether you take directly like this or so. So, this one is actually Ansari, Bradley choice. Let us take the case when n is odd that means, say n is equal to 2 m minus 1 kind of thing. If that is so then n a bar is equal to 2 m bar that is equal to sigma modulus i minus n plus 1 by 2 that is becoming sigma i minus m that is 2 m minus 1 plus 1 by 2. So, i is equal to that is equal to sigma m minus i plus sigma i minus m. So, i is equal to 1 to m minus 1 and this is from 2 m minus 1 m plus 1 to 2 m minus 1 corresponding to i is equal to m this term becomes 0. So, if we look at this, this is becoming m into m minus 1 by 2 and this will also give the similar thing minus m into m minus 1 by 2 plus sigma j is equal to 1 to m minus 1. So, this actually gets cancelled out. So, you simply get m into m minus 1 which is n plus 1 by 2 n minus 1 by 2 that is n square minus 1 by 4 here. So, if n a bar is this one then we are able to get the value of expectation a that is equal to m n square minus 1 divided by 4 n of course this I have done for n art. If n is even say n is equal to 2 m in that case you consider n a bar that is equal to sigma i minus 2 m plus 1 by 2 that is n plus 1 by 2 i is equal to 1 to 2 m. So, again we split into 2 parts i is equal to 1 to m then this is equal to 2 m plus 1 by 2 minus i plus sigma i minus m sorry 2 m plus 1 by 2 for i is equal to m plus 1 to 2 m. Once again we can easily simplify these terms this becomes m into m plus 1 by 2 minus sigma i i is equal to 1 to m plus sigma j j is equal to 1 to m. So, this is minus m by 2 that is m square that is n square by 4. So, a bar is then equal to n by 4. So, expectation of a in this case becomes equal to m n by 4. So, variance of a can be calculated. Once again if actually the interpretation of this is same as the moods statistic because if a is large it will mean that there is more variability in the x data that is theta is greater than 1. So, basically we say that large a indicates theta greater than 1 a small a indicates theta is less than 1. Now, some variation of this form has been given here. See you are considering the absolute deviation here. So, and we are taking direct sum here we can consider we may also consider a reverse form a i is equal to n plus 1 by 2 minus as you can see the logic behind it that this is just a little variation from this. This form is actually called friend and sorry form. So, this function let me write i is equal to 1 to m n plus 1 by 2 minus r i minus n plus 1 by 2. So, that is equal to m into n plus 1 by 2 minus basically and sorry Bradley. So, it is just the reverse one. So, small f indicates theta greater than 1 and large f indicates theta is less than 1 expectation of f will be m into n plus 1 by 2 minus expectation of a and the expectation a term I have already calculated here when n is odd then it is m into n square minus 1 by 4 n and when n is even it is equal to m n by 4. There is yet another variation of this which is called David Bartle variation you can consider a i as this is done sorry Bradley wise, but you shift it little bit that means basically it is the adjustment of the even and odd value. So, basically so b is equal to sigma a of r i i is equal to 1 to m that is equal to m times n plus 2 by 2 minus the friend and sorry choice that is this choice here because if I take this minus this minus in the bracket then this will become the friend and sorry choice. Therefore, once again we can obtain this is also the n plus 2 by 2 minus m into n plus 1 by 2 plus and sorry Bradley choice. So, here large value of b indicates that theta is greater than 1 and of course, small b will indicate theta is less than 1 then there is a Siegel to key choice in Siegel to key choice we take a i to be equal to 2 y if i is even it is equal to twice i minus 1 if i is odd for 1 less than or equal to i less than or equal to n by 2 and it is equal to 2 times n minus i plus 1 if i is equal to n minus i is even and it is equal to twice n minus i plus 1 minus 1 if i is odd n by 2 less than i less than or equal to n and s is equal to sigma a of r i this is a small n here small values of s indicate that theta is greater than 1. We have one final comment here that this mood friend and sorry and sorry Bradley and David Barton these statistics they are more sensitive to one sided hypothesis that is theta greater than 1 or theta less than 1. But the Siegel to key this is more sensitive for theta not equal to let me briefly mention one more here that is called the Claude's normal score in this one we define phi inverse i by n plus 1 whole square. You can compare it with the one which I gave earlier that was phi inverse of i by n plus 1 wonder warden sister in this one we had phi inverse i by n plus 1 and here you can see this is equal to phi inverse i by n plus 1 whole square here. So, the Claude's statistic is given by phi inverse r i by n plus 1 whole square i is equal to 1 to n. I am not getting too much into detail of the working out of this of course it is slightly more complicated than the wonder warden's statistics because if I take this one I am getting the sum is equal to x square plus 1 minus x whole square. So, that is not a constant here. So, this will require some working out to get the outcome of this. In the next class I will discuss about the Socrates two sample test I will discuss the null distribution of this and we will also introduce the concept of the consistency of the statistical test. You might have seen that when we consider the parametric test. So, in the parametric test we discuss about the power of the test and the say we consider the type 1 error and the type 2 error, but in the when we consider the two sample test since we are not having the form of the distribution. So, we are not using the you can say most powerful or the that is a usual name and Pearson theory is not being applied here. And therefore, the test functions are based on these linear rank statistics and the exact distributions are quite complicated. So, we consider the asymptotic properties of these things. So, in the next lecture I will be discussing about the asymptotic properties of the test here. Firstly, I will discuss about the Socrates test and then we will discuss about other asymptotic properties.