 Friends, in the last class I had introduced various tests for the single sample location problems and then I had also introduced a two sample location problem. Let me recapitulate this thing, we have two distributions f and g and so we want to basically check whether one of the distributions is a location shift from the other one. So, if we consider say theta greater than 0, theta less than 0 or theta not equal to 0, it is meaning that the median of the distribution of f is either a smaller than the median of g or it is larger than the median of g or it is simply not equal to the median of g. Now, for this one we had proposed a two sample test based on the observations x 1, x 2, x m from f distribution and y 1, y 2, y n based on the g distribution. So, we had defined a Mann-Whitney Wilcoxson u statistic which is given by double summation d i j, where d i j is 1 if y j is less than x i, it is equal to 0 if y j is greater than x i. So, if u is large then naturally it means that there are that the median of g will be larger than the median of f if u is small. So, like that we propose the test here. Now, we discuss the null distribution of u etcetera. So, let us start the discussion on that. So, we start with the null distribution of u. So, suppose r m and u is equal to the number of arrangements of m x's and n y's for which u is equal to u. So, u consider then it is equal to r m and u divided by the total number of choices is m plus n c m, as we know that the values of u can be from 0 to m n. The first fact that we observe is that the distribution of the distribution of u when theta is equal to 0 is symmetric about the mean value that is m n by 2. For this to prove this we should show that p naught u is equal to m n by 2 plus u is equal to p naught u is equal to m n by 2 minus u for all u it should be true. Now, consider any arrangement and I name it as a arrangement a of m x's and n y's which gives u is equal to u. Now, we consider the conjugate arrangement of x's and y's that means in which the positions of x's are replaced by the position of the y's that means the roles of x and y's are interchanged I call that as the arrangement a prime. Let a prime be the arrangement of x's and y's which is conjugate then the value of u for a prime that will be m n minus u because all the x's have become y's and y's have become x's. So, if we look at this definition here in this definition it will become reverse of that. So, if we do that then this will become m n minus u. Another point which we have seen that if u is large that means there are more number of x i's which are larger than the y j's then the median of the distribution of x will be larger than the distribution of the f. So, I think I have written it the reverse here median of f will be larger than the median of g. So, we can make the correction in that way. Now, if I consider P naught a that is equal to r m n u divided by m plus n c m P naught a prime is equal to r m n m n minus u divided by m plus n c m. So, P naught u is equal to m n by 2 minus u that is equal to r m n m n by 2 minus u divided by m plus n c m. Now, that is equal to r m n m n minus m n by 2 plus u because here we have just interchanged them divided by m plus n c m that is equal to r m n m n by 2 plus u divided by m plus n c m that is equal to probability of u is equal to m n by 2 plus u. So, we have proved that the distribution of u is symmetric about m n by 2. We further derive to derive the distribution of u actually in generally if I have m and n values then what is the probability of u is equal to u as I have written it is r m n by r m n u by m m plus n c m. So, if I take any values of m and n then it is quite complicated because at each stage the number of permutations will be very large . So, we can develop a recursion formula for this we can develop a recursion formula for evaluating probabilities for distribution of u. That means, if I consider p naught u m n that means based on m n observations m observations from x and n observations from y then it is m by m plus n p naught u m minus 1 n is equal to u minus n plus n by m plus n p naught u m n minus 1 is equal to u. Let us look at the proof if I consider p naught u m n is equal to u that is r m n u by m plus n c m that is r m minus 1 u minus n plus r m n minus 1 u divided by m plus n c n because in the preview if the last observation is x. So, and it is the largest then the value will increase by n otherwise it will remain the same. So, either it is u minus n or it is u in the previous step . So, then that is equal to now this you can adjust as r m minus 1 n u minus n divided by m plus n minus 1 c m minus 1 m plus n minus 1 c m minus 1 divided by m plus n by m c m plus r m n minus 1 u m plus n minus 1 c m m plus n minus 1 c m divided by m plus n c m . So, these two terms can be simplified and we get this equal to m by m plus n p naught u m minus 1 n is equal to u minus n plus n by m plus n p naught u m n minus 1 is equal to u . And for evaluation for higher order thing we look at what is u 1 1 u 1 1 can take two values 0 and 1. So, it will be 0 with probability half and p naught u 1 1 is equal to 1 that will be with probability half . Now, let us look at the mean and variance of the u statistic under general hypothesis that means, when the true parameter value is theta under true value theta . So, since it is dependent upon the probability of y j less than x i or when we have 2 then y j less than x i y k less than x i or y j less than x i y j less than x h. So, we have to give some notation to that let us consider say p theta y j less than x i is equal to pi p theta y j less than x i y k less than x i for j naught equal to k that is equal to pi 1 p theta y j less than x i y j less than y h for i naught equal to h that is equal to pi 2 in this one j is same and here i is the same . So, let us consider the expectation of u that is equal to double summation expectation of d i j i is equal to 1 to m j is equal to 1 to n . So, d i j will be 1 when y j is less than x i this probability is equal to pi . So, that is equal to simply m n pi . Similarly, if I look at variance of u that is that is equal to double summation variance of d i j plus covariance between d i j d i k j is not equal to k and then there will be other terms also like there will be terms covariance between d i j d h j plus i naught equal to h j naught equal to k covariance between d i j and d h k . So, variance of d i j that is becoming pi into 1 minus pi square plus 1 minus pi into pi square. So, that is equal to pi into 1 minus pi . Let us look at the various covariance terms here this covariance this covariance and so on covariance between d i j and d i k where j is not equal to k then this is equal to expectation of d i j d i k minus individual expectations that is expectation of d i j into expectation of d i k that is pi square . Now, this term is going to be 1 when you have y j less than x i and y k is less than x i for j naught equal to k, but this value we have assumed to be pi 1. So, this is becoming pi 1 minus pi square. Similarly, if I look at covariance between d i j and d h j term where i is not equal to h then this is equal to expectation of d i j d h j minus expectation d i j into expectation d h j both are pi. So, this is becoming pi square . Now, this value will be equal to 1 only if you have y j less than x i and y h less than y j less than x h . So, this value we have assumed to be this should be x h here. So, this value we have assumed to be pi 2. So, that is equal to pi 2 minus pi square . So, now, if we look at variance of u after substitution of all the terms here and of course, this last one will be 0 y because this is involving y j and x i and this is involving y k and x h. Since, x 1, x 2, x m, y 1, y 2, y n they are independent random variables therefore, d i j and d h k will be independent and therefore, the covariance between them they will become 0 . So, then we are left with these terms and let us count how many terms will be coming . So, the first one is the sum over all the values. So, that is equal to m n pi ah m n pi into 1 minus pi plus how many terms are here that will be equal to m n in this one m n into n minus 1 terms pi 1 minus pi square plus m n into m minus 1 pi 2 minus pi square . Now, let us also see what are these values under h naught under h naught what happens to pi that is probability of y j less than x i that will be simply equal to half because this is becoming d g y d f x minus infinity to x minus infinity to infinity under h naught they are same d f y d f x . So, that is equal to simply in the first integral this will give me f x and then f x d f x that will give me f square x by 2. So, that is equal to half . Similarly, we can evaluate pi 1 and pi 2 under h naught pi 1 is equal to probability of y j less than x i y k less than x i where j is not equal to k. So, that is equal to integral probability of y j less than x y k less than x d f x when x is fixed then y j and y k become independent. So, this can become equal to the product of these values and that is simply becoming g x square d f x . Now, under h naught g is equal to f. So, it is becoming f square x d f x and this is nothing, but 1 by 3 because this is becoming f cube by 3 . So, from minus infinity to infinity this will be evaluated to be 1 by 3 . Similarly, if we look at pi 2 that is probability of y j less than x i y j less than x h where i is not equal to h then that is equal to p naught y less than x i y less than x h d g y that is equal to 1 minus f of y whole square because when y is fixed x i and x h are independent. So, this becomes probability of x i greater than y that is 1 minus f y and this becomes probability of x h greater than y that is also 1 minus f y. So, this becomes square d g y . So, when g is equal to f under the null hypothesis then this is becoming 1 minus f y square d f y that is equal to minus 1 minus f y cube by 3 . So, at plus infinity this will become 0 and at minus infinity it will become 1. So, this is also equal to 1 by 3 . So, under the null hypothesis when f is equal to g the value of pi is half the value of pi 1 is 1 by 3 the value of pi 2 is also equal to 1 by 3 . And we can look at the expressions here ah 1 by 3 minus 1 by 4 . If I substitute the values here pi is equal to half then this becomes m n by 4 this value will become 1 by 3 minus 1 by 4 that is equal to 1 by 12 and here also it will become 1 by 3 minus 1 by 4 this will become 1 by 12 . So, we can simplify so we will get ah. So, under H naught expectation of u is equal to m n by 2 variance of u that is equal to m n by 4 plus m n into n minus 1 by 12 plus m n into m minus 1 by 12 ah. These can be simplified this actually becomes equal to m n into m plus n plus 1 by 12 . So, for various purposes this ah distribution of u can be utilized here . The general use of the ah this two sample Man Whitney Wilcox and U statistic is to test the location that means, whether the median of one of the distributions is larger than the median of the other or less or it is simply not equal to ah. We have been able to derive the null distribution so it can be used for several purpose. Now, let us consider a variation of this that is called simply the Wilcoxian statistic for two samples . So, first we do that we combine all the observations x 1, x 2, x m and y 1, y 2, y n and we treat it as one sample let us call it z 1, z 2, z n arrange x 1, x 2, x m, y 1, y 2, y n as one sample say call it z 1, z 2, z n where n is equal to m plus n that means, we are saying z i is equal to x i for i is equal to 1 to m and it is equal to y i minus m for i is equal to m plus 1 to m plus n that is capital N ok. Now, if ah the null hypothesis is true that means, if the two samples ah the two distributions are the same then basically it becomes simply one random sample from the entire ah the population f ah. Otherwise there will be some discrepancy that means, we are mixing some different kind of things ah. Let us consider let w be the sum of ranks of x i's in the combined sample. So, we can define ah ok. So, if we consider say w is equal to summation of r i's i is equal to 1 to m ok that is equal to sigma number of y j's which are less than x i plus number of x j's which are less than or equal to x i and this we are doing for all i is equal to 1 to m. So, if I sum this this is nothing but the d i j's i is equal to 1 to m j is equal to 1 to m and the second term if you look at when I sum this this is simply m into m plus 1 by 2 because what we are doing how many x j's are less than or equal to 1 particular x i and this we are doing for every i then this is nothing but the sum of all the ranks. So, it is becoming m into m plus 1 by 2. So, basically you are saying this will coxon m ah w statistic is u plus m into m plus 1 by 2. So, it is simply a shift from u therefore, this can also be used for testing the hypothesis here. So, we can have in general expectation of this will become equal to m and pi plus m into m plus 1 by 2 the variance of w will be same as the variance of u because it is simply a location shift. Also the null expectation of this will become m n by 2 plus m into m plus 1 by 2 that is equal to m into m plus n plus 1 by 2. So, the use of will coxon w is same as the use of man whitney both can be used interchangeably in certain problems it is easier to calculate w rather than the u. Now, I consider general simple linear rank statistic for the two sample problems. Let z 1, z 2, z n be n random variables c 1, c 2, c n be n constants. Here we call them regression constants and let us call a 1, a 2, a n scores. So, these are also some constants, but I call them scores. So, these have to be chosen. So, now let us consider say r i is equal to rank of z i, i is equal to 1 to n. So, then s is equal to sigma of c i, a of r i, i is equal to 1 to n. This is called simple linear rank statistic. See in will coxon case we have chosen z i to be x i for i is equal to 1 to m and it is equal to y i minus m for i is equal to m plus 1 to n to m plus n and c i is 1 if i is equal to 1 to m and it is equal to 0 for i is equal to m plus 1 to m plus n and a i is equal to i for i is equal to 1 to n. Let us also consider what happens under h naught. Under h naught this z 1, z 2, z n they become independent and identically distributed random variables because they are coming from the same distribution if f is equal to g. So, if I consider r is equal to r 1, r 2, r n that is the vector of the ranks and now. So, this will be any permutation of this is any permutation of numbers 1 to n. Let us consider say script or the set of all permutations of the numbers 1 to n. Then first result is that the under the null hypothesis each of the permutation will be equally likely. Let us look at the an elementary proof of this. If we consider say let us fix say some value r as a fixed value in the set of permutations. Now if I am considering say z 1, z 2, z n then corresponding to this we are having r 1, r 2, r n these are the ranks here ok. Let us consider d i to be the anti rank. So, this is a new terminology that I am introducing here. This is nothing but the position of i in the vector r i is equal to 1 to n. See this is like this suppose I consider 3 numbers suppose I have say 1, 2, 3 ok and I consider say the arrangement of the ranks as say 2, 3, 1 suppose this is an arrangement here. Then what is the anti rank here d 1 is 3, d 2 is equal to 1 and d 3 is equal to 2. These are the anti ranks here. So, let us write here let us write say d is equal to d 1, d 2, d n that is the vector of the anti ranks here. Then what we are saying is that z d 1 is less than z d 2 less than z d n. Now under H naught z d 1, z d 2, z d n will have the same distribution as z 1, z 2, z n because this is simply one permutation of numbers 1 to n. It is some permutation of numbers 1 to n here. So, if I consider say probability of say r is equal to 1 to n then that is equal to probability that z 1 less than z 2 less than z n. That is equal to probability of z d 1 less than z d 2 less than z d n because the distributions are the same, but this is nothing, but the probability that r is equal to r 1, r 2, r n. That means what I am saying for any permutation it is equal to the same probability. That means each of them will have the equal probability 1 by n factorial. So, this proves that the distribution of the ranks is discrete uniform distribution over all permutations. Now we consider individual ranks also. That means I consider the ith rank then of course this can be from 1 to n then we will prove that it is actually equal to for k equal to 1 to n for i is equal to 1 to n. For each of them it will take the same number of values with same probabilities. See how do we derive this? This is equal to the sum over r for which r i is equal to k. So, how many such things will be there? It will be n minus 1 factorial divided by because one rank I am fixing for the ith one and other n minus 1 positions will interchange. They can be permuted in n minus 1 factorial ways. So, it is becoming simply equal to 1 by n. Similarly, we can consider say probability of say r i is equal to k r j is equal to l where i is not equal to j. Then of course, this is 0 if k is equal to l I am dealing with the continuous distributions. So, I will not assume that the two values can be same because the two values will be same with probability 0. Now if I am fixing two values then n minus 2 factorial divided by n factorial that is equal to 1 by n into n minus 1 where k is not equal to l and both k and l can vary from 1 to n. So, the joint distribution of two ranks can also be obtained and that is also bivariate you can say discrete uniform distribution. Let us consider say a function which is a on two function from r to r and also 1 to 1 1 and on two then f r let me call it r star. So, this is actually a vector here. So, this has a discrete uniform distribution that is if I consider then that is equal to p naught f r is equal to r that is probability of r is equal to f inverse r that is equal to probability of r is equal to r star that is equal to 1 by n factorial for r belonging to r. So, we are able to talk about this basic distribution of the ranks of the observations when I consider the combined sample. So, under the null hypothesis it is from the same distribution therefore, these statements are valid. Let us consider now s is equal to sigma c i a of r i that was our expression for the simple linear rank statistic. So, if I consider expectation of a r i that is equal to sigma a h probability that r i is equal to h this will be r i is equal to h for h is equal to 1 to n because r i can take values 1 to n here. So, this probability is 1 by n. So, it is simply becoming 1 by n sigma a h h is equal to 1 to n. Let us denote this quantity by a bar here. So, expectation of s that is becoming sigma c i a bar which you can also write as n a bar c bar where c bar is nothing but the mean of c i's equal to 1 to n here. We can also consider the variance here. So, firstly let us consider the variance of a r i that is equal to sigma a h minus a bar square probability of r i is equal to h i is equal to h is equal to 1 to n. So, this is 1 by n. So, it is becoming 1 by n sigma a h minus a bar whole square h is equal to 1 to n. For i not equal to j let us consider the covariance between a r i and a r j that is equal to double summation a h minus a bar a k minus a bar probability of r i is equal to 1 to n. So, this is equal to h r i is equal to k that is equal to 1 by n into n minus 1 a h minus a bar a k minus a bar h not equal to k. This we can write as 1 by n into n minus 1. This term we write as the square of the sum minus sum of the squares that means it is equal to sigma a h minus a bar whole square minus sigma a h minus a bar square. So, this term becomes 0. So, we are left with minus 1 by n into n minus 1 sigma of a h minus a bar square h is equal to 1 to n. So, these two terms we can use in the variance of s and we will get here variance of s is equal to sigma c i square variance of a r i plus double summation c i c j covariance a r i a r j. The values of variance a r i and covariance of a r i and a r j have just been calculated. So, we substitute here. So, we get sigma c i square and this term is nothing but 1 by n sigma a i minus a bar square plus double summation i not equal to j c i c j that is 1 by n minus c n into n minus 1 sigma a i minus a bar square. So, this is becoming 1 by n minus 1. See this term I can take outside. So, this will become simply sigma of c i minus c bar square 1 to n into sigma of a i minus a bar square 1 to n. So, the variance of s is equal to in the general function that means, if I consider general constants and that means, regression constants and general score function, we can derive the null mean and the null variance of the distribution of the linear rank statistic. As an application you can see to some of the two sample problems, let us consider some applications to two sample problems. Let us consider say c i is equal to 1, if i is equal to 1 to m and it is equal to 0, for i is equal to m plus 1 to n. Then s is equal to sigma a of r i 1 to m that is equal to and c bar is equal to m by n. So, sigma of c i minus c bar square 1 to n that is equal to m into 1 minus c bar square plus n c bar square that is equal to m n square by n square plus n m square by n square that is equal to m n by n. I can take out m n then this will become m plus n that is n. So, n n square cancels out to get you m n by n. So, under this if I consider expectation of s that is equal to n minus c bar square a bar m by n that is equal to m a bar and variance of s that is equal to 1 by n into n minus 1 m n sigma a i minus a bar whole square. For Wilcoxon rank sum statistic a i is equal to i. So, if I put that value here what I will get. For Wilcoxon rank sum statistic a i is equal to i. So, a bar becomes n plus 1 by 2 and sigma a i minus a bar that will become equal to n into n square minus 1 by 12 that is the mean of the discrete uniform distribution and the variance of the discrete uniform distribution. So, if I consider expectation of w that is equal to m into n plus 1 by 2 and the variance is equal to m n into n plus 1 by 12. So, if we compare with the values that I derived earlier you can match here whether it is the same or not. So, if you look at here this was equal to m into m plus n plus 1 by 2 m plus n is equal to n. So, it is the same value variance of this was same as the variance of u which was actually m n into n plus 1 by 12. So, here also you get m n into n plus 1 by 12. So, you can see that this general structure helps us to perceive of various other new test statistic that can be utilized for various purposes in the testing problems and next we consider the concept of projection. So, we state a theorem here we call it a projection theorem. Suppose say x 1, x 2, x n is a random sample from an arbitrary distribution H x. Let v is equal to v of x 1, x 2, x n be a random variable such that v is equal to v of x 1, that expectation of v is equal to 0. Now, if w is equal to sigma p i x i i is equal to 1 to n then expectation of v minus w square is minimized by choosing the function p i x is minimized by choosing the function p i x as p i star x is equal to expectation of v given x i is equal to x. So, this random variable which is obtained as v p the random variable v p that is defined as sigma p i star x i this is called projection of v expectation of v minus v p square is equal to variance of v minus variance of v p. Let me name this relations as 1 and 2 here. Let us look at this what we are saying is that we have the function w as sigma p i x i. So, this is minimized when we consider the conditional expectation of v with respect to x i and then when we do this for every x i then if we sum it then it is called the projection of v. Let us look at the proof of this here by adding and subtracting v p. So, expectation of v minus w square that is equal to expectation of v minus v p square plus expectation of v p minus w square plus twice expectation v minus v p v p minus w. So, if I look at the expectation of v minus v p into v p minus w that is equal to expectation of sigma p i star x i minus p i x i into v minus v p for i is equal to 1 to n. Now, this we can write as the expectation well summation expectation of expectation p i star x i minus p i star x i minus p i x i into v minus v p given x i. So, this becomes equal to expectation of p i star x i minus p i x i this term can be separated out expectation of v minus v p given x i. Now, if we consider expectation of v minus v p minus v p given x i. So, v minus v p given x i that is equal to expectation of v minus p i star x i minus sigma p j star x j given x i. Now, if we look at the definition 1 here then this is actually equal to 0. So, this part is 0 and so, this part becomes 0 and if I look at this term expectation of p j star x j given x i then what it is equal to expectation of p j star x j because x i and x j are independent. So, it is equal to expectation of expectation v given x j that is expectation of v that is equal to 0 because I am assuming v to be a random variable such that expectation v is 0. So, this term is also 0. So, basically this entire term is becoming actually equal to 0 this term is becoming 0. So, this term is entirely becoming equal to 0. So, what we are getting is that expectation of v minus. So, expectation of v minus w square is equal to nothing but expectation of v minus v p square plus expectation of v p minus w square. That means, it is the expectations of the two positive terms non negative terms. So, this is minimized if we choose w is equal to v p here. If we choose w is equal to 0 then the expectation 2 also follows. So, this completes the proof of this projection theorem. As a remark let me mention here the proof also works if x 1, x 2, x n are independent, but not necessarily identically distributed. So, in some applications this theorem can be used because when x i's are coming independently, but they are not having the same distribution then also this concept of projection can be used here. So, we have the following theorem which is following from here. Suppose, w n has asymptotic normal 0 sigma square distribution and expectation of u n minus w n square this goes to 0 as n tends to infinity. Then u n has asymptotic normal 0 sigma square distribution. For proving let us define R n is equal to u n minus w n. So, probability of R n greater than or equal to epsilon that is less than or equal to expectation of R n square by epsilon square that is equal to expectation of u n minus w n square divided by epsilon square this goes to 0 as n tends to infinity. So, this proves that R n goes to 0 in probability. So, now you add it here this implies R n plus w n that will converge in distribution to normal 0 sigma square. Using these properties I will be deriving the asymptotic distribution of the Man Whitney u statistic and the Wilcoxon rank some statistics in the next lecture. So, that I will be covering in the.