 friends in this course till now we have introduced the order statistics and their distributions we considered probability integral transform and therefore, the distribution of the probability integral transforms of the order statistics ah the distributions of one of them the joint distribution their moment structure. Then we introduce the empirical distribution function and using the empirical distribution function if we consider the transformations of the ah random sample observations and their order statistics and we looked at their distributions their joint distributions and their moment structures. We saw that how these can be used in certain two sample testing problems. ah We discussed the goodness of it test by Kalmogorov and Smirno and also the original one that is by Carr Pearson. ah Now, we will concentrate on the location problems single sample location problem we have seen that one of the raw test or a naive test is given by the sign test that means, how many of the observations are above the median value which we want to test are below that. So, that is called the sign test we have seen it is a all right I mean it is perform, but it does not depend upon the measurements values it is simply dependent upon the ah how many positive or how many negative values are there. ah Then there are certain other test which are based on the observation the ranks of the individual observations rather than just the sign. So, one of the first one is the Wilcoxon sign rank test. So, let me introduce the problem . So, we are considering single sample location problem with symmetric and continuous distribution. So, let us considers suppose x 1, x 2, x n a random sample from a distribution function f x. So, this is the cumulative distribution function assume that f is continuous and f is symmetric about a point theta. See if it is symmetric about theta then of course, we can say that theta is the median ok. In the signed ah sign test we have not assumed symmetry we just say that ah whether median is a given value m naught. Of course, the distribution of that thing we have found for the case of symmetric distribution also, but in general it can be anything. So, we want to test problems like this whether theta is equal to 0. Basically, we can test theta is equal to theta naught ah. So, see if we consider theta naught. So, without loss of generality we can take theta naught to be 0 as in previous problem I have already explained. So, against we can consider hypothesis of the time theta greater than 0 or h 2 theta less than 0 or h 3 theta naught equal to 0. So, these could be alternatives we will consider application of the signed rank statistic this is called Wilcoxon signed rank statistic. This was given by Wilcoxon in 1945. Let us consider observations by taking their magnitude. So, now the raw values have transformed to their magnitudes and consider their ordering. So, let us consider say x 1 among them. Now, this is different you note here firstly we are considering magnitude and then we are ordering. So, these are different from that note that in general this x i will not be same as x i. If all the observations are positive then this may be true if all the observations are negative then reverse of this may be true. That means, ah the ordering will be simply reversed. So, this is different we are looking at the magnitudes and let us consider say r i plus is the rank of absolute x i among x 1, x 2, x n. Now, if we are considering this then if we consider the vector r plus that is r 1 plus and so on r n plus. So, this will be simply a permutation of 1 to n is a permutation. So, that is why it is called signed rank because we have considered modulus here. So, we are not bothered about the plus minus sign is a permutation of 1 to n. Now, based on this we define u x i u x i is equal to 1 if x i is positive and it is equal to 0 if x i is less than 0. Of course, equal to 0 case we are ignoring because we are dealing with the continuous random variables. So, probability of x i equal to 0 or ah will be 0. Now, based on this we define t plus t plus is summation of u x i r i plus i is equal to 1 to n. Then actually it is nothing but the sum of ranks of modulus x i for which x i is actually positive because I am taking u x i into r i plus. So, if x i is negative then this term will not be counted. So, it is the sum of the ranks of modulus x i for which x i is positive. This is called Wilcoxson signed rank statistic. So, now you can understand that I am considering only the ones which are positive and for those which are positive I am looking at the ranks of x i is among the ordered modulus x i's ok. So, we then now you can easily see that what will happen that if theta is greater than 0 that means, greater than theta naught or something like that. So, here since we have taken without loss of general t 0 then there will be more values which will be positive. Therefore, this value will be somewhat larger. So, if we consider the distribution of t plus and we consider the percentage points of that and again see although the random variables are continuous, but this t plus is discrete because this is simply the sum here. As in the signed rank and as in the signed test statistic this Wilcoxson signed rank statistic is also having a discrete distribution. So, therefore, there is a possibility that a particular significance level may not be attained. So, we then consider in the same way define c beta to be the smallest beta ah sorry smallest t such that probability of t plus greater than or equal to t under the null hypothesis that is median is 0 is less than or equal to beta and of course, c 1 minus beta to be the largest t such that probability of t plus less than t is greater than or equal to 1 minus beta of course, beta is some number between 0 and 1. So, the so, you can consider basically that c beta is the actually if it is a continuous distribution then it will be simply the upper 100 beta percent point and this one will become the lower 1 minus 101 minus beta percent point here, but since the distribution of t plus is discrete. So, we need to define in the terms of a smallest and largest here ok. So, we can then consider that a level alpha test for H naught against H 1 is to reject H naught if t plus is greater than or equal to some c alpha against H 2 it will be to reject H naught if t plus is less than or equal to some c 1 minus alpha against H 3 it will be reject H naught if t plus is either greater than some c alpha by 2 or t plus is less than some c 1 minus alpha by 2. Now, the question comes about the determination of this c alpha values nowadays of course, it is easy to look at the computer program and we can fix of this thing, but let us look at a general result of this nature. Actually since this is a random permutation in general because given observed values this r 1 plus r 2 plus r n plus 1 will be a random permutation of 1 to n and how many permutations will be there? There are n factorial permutations here. Therefore, each permutation will have a probability 1 by n factorial under the null hypothesis. So, let us write this as a result here. We have the following theorem. Let us consider u x vector to be the u x 1 u x 2 u x n that is sin of x i. So, we just collect them. So, this is actually a collection of 1 to 1 0 1s and 0s and we consider the r plus as the vector of the signed ranks under H naught that is theta is equal to 0 u and r plus they are independently distributed probability of u x i is equal to 1 is equal to p naught of u x i is equal to 0 that will be half and r plus has a discrete uniform distribution over the set S n of permutations of 1 to n. That is we are saying probability of r 1 plus is equal to some r 1 and so on, r n plus is equal to r n that is equal to 1 by n factorial for r is equal to r 1 r 2 r n belonging to S n. S n is the set of all permutations of the number 1 to n. Let us look at a rough proof of this. So, x 1 x 2 x n are independent and identically distributed random variables. Now, this implies that u x 1 u x 2 u x n they will be independent and identically distributed random variables. It will also mean that modulus of x 1 modulus of x 2 modulus of x n these are also i i d. Now, if we look at the r plus vector these ranks are functions of modulus x 1 modulus x 2 modulus x n is not it. Therefore, because how I have defined r i r i is a rank of modulus x i among modulus x 1 modulus x 2 modulus x n that means, this is entirely a function of the absolute values here ok. So, here you look at r plus is a function of modulus x 1 modulus x 2 modulus x n. Now, you see here u is a function of x 1 x 2 x n and this is a function of modulus. So, if we can show that u x i is independent of the modulus then we are through. So, hence if we show that u x i is independent of modulus x i for any i then u and r plus will be independent. Let us consider say probability of say u x i is equal to 0 modulus x i less than or equal to x. Then this is equal to probability of u x i is equal to 0 into probability of modulus x i less than or equal to x. This is one statement I need to prove I also need to prove probability of u x i is equal to 1 modulus x i less than or equal to x is equal to u x i is equal to 1 modulus x i less than or equal to x. These are the things to be proved. Now, one thing you note if I take this small x to be negative then certainly this term is 0 and this term is 0 and similarly in the second statement. So, both the results are satisfied both the statements are trivially if x is less than 0. Now, let us consider x to be greater than 0. Now, for greater than 0 let us consider one term here u x i is equal to 0 modulus x i less than or equal to x. Now, this is x i less than 0 because u x i is 0 if x i is less than 0 and the second part I write as minus x less than or equal to x i less than or equal to x. Now, this is nothing but if you combine these two it is becoming simply minus x less than x i less than 0. We have assumed that x i has a symmetric distribution about 0. So, this can be written as half times minus x less than x i less than. So, here of course, less than or equal to is there. So, we can include that of course, it will not make any difference if I by mistake do not put equal to because the probability of equality is actually 0. So, this statement is due to symmetric nature of capital F. So, therefore, this is nothing but P naught of u x i is equal to 0 into probability of modulus x i less than or equal to x. So, you can see here I have proved this statement for x less than 0 it is trivially true for x greater than 0 now the proof is there. In a similar way if you consider P naught u x i is equal to 1 modulus x i less than or equal to x for x greater than 0. So, that is equal to probability of x i greater than 0 minus x less than or equal to x i less than or equal to x that is equal to probability of now if I again combine these two statements it is reducing to 0 greater than less than x i less than or equal to x. And as before due to symmetry this can be written as half times probability of minus x less than or equal to x i less than or equal to x which is nothing but probability of u x i is equal to 1 into probability of modulus x i less than or equal to x. So, we have proved this second statement for x less than 0 as well as for x greater than 0. So, if you look at u x i say or u x 1 for example. So, it is certainly independent of modulus x 1 and naturally it is independent of modulus x 2 modulus x n. So, in particular what I am able to prove is that u of x 1 will be independent of the vector r 1 plus r 2 plus r n plus. In a similar way u x 2 if I consider it is independent of the vector r 1 plus r 2 plus r n plus. So, if I look at the total vector because u x i's are independent and identically distributed. So, if I look at the vector of u that is this one since each of them is independent of r plus. If I look at the vector here which is obtained simply by combining independent random variables therefore, this is also going to be independent of r plus. So, this proves that this proves that u and r plus they are independent they are independently distributed. Now, since x 1 x 2 x n are independent therefore, modulus x 1 modulus x 2 modulus x n are independent and also identical. Therefore, any ordering among them will be equally likely. Therefore, the distribution of this will be simply the distribution of r plus is r 1 plus is equal to r 1 and so on r n plus is equal to r n that is equal to 1 by n factorial for any permutation r 1 r 2 r n belonging to s n. This s n is denoting the set of all permutations of the numbers 1 to n. Now, let us look at further the distribution of t plus I have been able to obtain separately the distribution of the terms which are involved in t plus here the distribution of u is coming the distribution of r i plus is coming also the independence is there. So, now somehow we try to utilize this to derive the distribution of t plus. Let us look at it. Let k be the number of x i's which are positive. Of course, we have seen the distribution of k that is binomial n half under the null hypothesis and also let us consider let s i be the rank of x i in modulus x 1 modulus x 2 modulus x n. Now, this is important here when I am considering ordinary this one then the rank of x i is simply the ith one whatever term is coming. Now, I am looking at the rank of raw x i among the modulus x 2 x 1 modulus x 2 modulus x n. So, this is only for positive x i's rank of see originally it would have been that in the same order it would have come, but now because the sum of the negative x i's will be placed in between because of the taking absolute value therefore, these ranks will change. So, let us consider say what is probability of say capital S 1 is equal to small s 1 and so on capital S k is equal to small s k where capital sorry let me put here capital K this is this number of positive x i's. Let us consider under the null hypothesis. So, this is S 1 S k is equal to S k. Now, I am putting small k given k is equal to small k P naught k is equal to k. So, that is equal to n C k 1 by 2 to the power n 1 by n C k. So, this cancels out you are getting simply 1 by 2 to the power n. So, you can see this number is 1 by 2 to the power n here. The reason is that each of the x i's can be positive or negative with this with probability half ok. Let us consider say t plus what are the values of t plus this takes values 0 1 up to n into n plus 1 by 2. If all of them are positive then it will be n by n plus 1 by 2 if all are negative then this will be 0. So, now let us consider u n t it is the number of arrangements of S 1 S 2 S k this is capital K which give S 1 plus S 2 plus S k is equal to t ok. So, you can actually see suppose I have n is equal to 1 that means, only one observation is there then u 1 0 that means, how many arrangements will be giving you this is equal to 0 that will be simply 1. How many arrangements will give you 1 only 1 because either x 1 can be positive or negative. So, if I consider say p 1 0 ok. So, now, let me define and similarly if I look at say n is equal to 2 for n is equal to 2 u 1 0 will be 1 u 1 1 that will be equal to 1. Let us derive a recurrence relation here it can be written like this u n t that will be equal to u n minus 1 t minus n plus u n minus 1 t. So, this is the recurrence relation that will be getting because if you are looking at say ranks of x 1 x 2 x n minus 1 and then you add x n here then what will happen this t n minus 1 plus. So, either it will remain t or it will become t minus n if t n plus y is equal to t because either it will be added by either it will be added by n that means, in case it is positive then all of them will be added by 1 if it is negative then no value is added here that previous ranks will remain the same. So, it will not change the value or in each 1 1 extension will be there. So, it will become t minus n plus n. So, if I consider probability distribution of t plus then it is equal to 0 if t is not in the interval it does not take one of the value 0 1 and so on n into n plus 1 by 2 and it is equal to u n t divided by 2 to the power n if t is in the set 0 1 to n into n plus 1 by 2. This recurrence relation actually gives you a method of calculation of this values of u and t because you are having say p naught t plus is equal to t then that is u n t divided by 2 to the power n, but this we can also write as x n less than 0 t n minus 1 is equal to t plus x n greater than 0 t n minus 1 is equal to t plus x n greater than 0 t n minus 1 is equal to t minus n. Now, both of these are known that is half times u n minus 1 t by 2 to the power n minus 1 plus u n minus 1 t minus n to the power. So, actually this gives you a method of evaluating the probability distribution of t plus at the nth stage. In a similar way one may consider t minus also we may also consider t minus that is 1 minus u x i that means, I am taking the ranks of negative 1 because when u x i is 0 1 minus u x i will become 1. So, that is actually n into n plus 1 by 2 minus t plus. So, this t minus is directly related to that that is basically we are saying t plus plus t minus is equal to n into n plus 1 by 2. If we consider say t is equal to t minus t minus which is of course, equal to 2 t plus minus n into n plus 1 by 2. For two sided testing problem when the alternative is theta is not equal to 0 if we are considering this alternative for this this t gives a more power than t plus. So, this actually implies t plus is equal to you can take it to the other side you get 1 by 2 t plus n into n plus 1 by 4. Now, we show that distribution of t is symmetric about 0. So, t is equal to 2 times sigma twice u x i minus 1 into. So, this 2 I can write inside r i plus i is equal to 1 to n. So, that is equal to 2 u x i j minus 1 into j j is equal to 1 to n because this is equal to 1 to n each of this r i plus will take some values 1 to n. So, I am writing that then correspondingly this value will change here this x i 1 x i 2 x i n this is a permutation of 1 to n. So, this permutation is obtained in the way in which the ranks are distributed. So, we give it a new name let us call it sigma w j that is w j is defined by this term. Now, what are the values of that w j takes value either plus j or minus j. Let us look at this what is the probability see this each w j takes values minus j and plus j. What is the probability say w is equal to j that is simply the probability of u x i j is equal to 1 that is probability of x i j greater than 0, but under the null hypothesis the simply half. And similarly if I consider minus j then that is equal to probability of x i j less than 0 that is also half. So, what we have proved that they are simply taking two values that they are simply taking two values plus j and minus j each with the probability. So, x i j is sorry that w j is w 1 w 2 w n they are independent. Of course, we should not say identical because although they take two values with equal probability that those values are changing ok. So, this is w j here. So, this is w j and if I look at the moment generating function of say w j m g f of w j that is expectation of e to the power t w j that is equal to half e to the power t j plus e to the power minus t j because it is taking two values. So, if I consider the m g f of t that is equal to sigma w j since they are independent it is simply becoming product of the m g f's of w j's. So, this is nothing but product of i j is equal to 1 to n half e to the power t j plus e to the power minus t j. Now, if I look at m t f minus t then I am it is same as m t of t that is expectation of e to the power minus t x is equal to expectation of e to the power t sorry t t. So, this is same as saying m minus t at t is same as m t of t. So, minus t and t have the same distribution. So, if a random variable and it is negative has the same distribution it means that t has a distribution symmetric about 0. So, this is interesting we have obtained the distribution of t is symmetric about 0 and what is t plus we have expressed t plus in terms of t. So, if t is symmetric about 0 t plus will be symmetric about n into n plus 1 by 4. So, these things actually give us more features about the test statistic that we are using here. So, the distribution of t plus is symmetric about n into n plus 1 by 4. If I look at expectation of t plus 1 by 4 t that is 0 expectation of t plus that will become n into n plus 1 by 4 and variance of t that is n into n plus 1 into 2 n plus 1 by 6. Well this you can calculate from the m g f because we have the m g f we can use m g f of t because second moment we can obtain by see this is a product of the term. So, if I consider one derivative then I will get here in the product. So, each term will be coming here and there will become a minus sign here in each of them because there are n terms here. So, at ith level this term will be differentiated other terms will be there, but the term which is differentiated will give me a minus value. So, that will cancel out. When we go for the second derivative now that term will become actually positive other terms will become 0, but that will happen with each of them. So, it is becoming basically sigma of j square because half of is there. So, that will be adding up. So, that is giving you simply n into n plus 1 by into 2 n plus 1 by 6 and if I consider variance of t plus then simply because it is half times that. So, that is becoming n into n plus 1 into 2 n plus 1 divided by 24. So, this is interesting we are able to find out the distribution of t plus the distribution of t and we are able to derive it is some of the first and second moments etcetera under the null. Now, once that is there and we are expressing it as a summation we can actually consider the central limit theorem. Let us consider application of Leopon-Knoves central limit theorem. Leopon-Knoves central limit theorem is applicable for independent, but possibly non-independent non-identical random variables. So, w 1, w 2, w n they are independent expectation of w i. So, I am writing down the statement here. Let us consider expectation of w i is equal to mu i variance of w i is equal to say sigma i square. Let us consider the third central moment of w i let us call it say rho i cube and if we are defining the terms like w is equal to sigma of w i mu is equal to sigma of mu i sigma i square is equal to sum of sigma i square rho cube is equal to sigma of rho i cube then if rho by sigma goes to 0 then the distribution of w minus mu by sigma is asymptotically normal as n tends to infinity. So, this is in convergence in distribution or convergence in law. So, this is actually the Leopon-Knoves central limit theorem. See if you look at the original central limit theorem it is for the independent and identically distributed random variable which is also called I think Lindberg-Levy central limit theorem. That is applicable when random variables are independent and identically distributed we only assume that the variance is existing. So, second moment's existence is there. When the random variables are not identically distributed then this Leopon-Knoves central limit theorem gives a sufficient condition for the asymptotic distribution being normal. Basically, this is the central limit theorem here, but here we have to assume that third one here. That means, the third central moments must exist and then the condition is imposed upon that. Now, if we look at our t it is exactly of that same form here w 1 w 2 w n are independent certainly they are not identically distributed. Their distributions are symmetric means are 0, but variance will be j square by 2 plus j square by 2. So, that is j square. So, let us use this. So, t is sigma w j j is equal to 1 2 n expectation w j that is mu j that is equal to j by 2 minus j by 2 that is equal to 0. If we consider say sigma j square that is expectation of w j square mu j square that will become equal to j square by 2 plus j square by 2 that is equal to j square. And if I consider the third central moment since mean is 0 it is simply equal to j cube by 2 plus j cube by 2 because we have taken the absolute value here. So, this is j cube. So, now we write all the terms here mu is 0 sigma square is equal to sigma of j square j is equal to 1 2 n that is equal to n into n plus 1 into 2 n plus 1 by 6. What is rho cube? Rho cube is equal to sigma of j cube for j is equal to 1 2 n then it is equal to n square into n plus 1 square by 4. So, if I consider rho by sigma. So, there will be some constant here because there is some constant coming here actually we can just write it as n square n plus 1 square by 4 to the power 1 by 3 divided by n into n plus 1 into 2 n plus 1 by 6 to the power 1 by 2. So, this is proportional to as n becomes large. So, this is n to the power 4. So, n to the power 4 by 3 divided by n to the power 3 by 2 some constant will be there. So, this is 4 by 3 minus 3 by 2. So, that is coming in the denominator. So, n to the power. So, 3 by 2 minus 4 by 3 that is simply becoming 1 by 6. So, this certainly goes to 0 as n tends to infinity. So, Lyapunov's CLT holds and we get asymptotic distribution of this T now mu is 0. So, T divided by sigma this sigma is square root of this quantity this sigma is square root of this quantity T by sigma this is converging to z in distribution as n tends to infinity. So, asymptotic distribution of T is simply normal and since there is a direct relationship between T plus and T. So, if I put it here then I get the asymptotic distribution of T plus also. This also gives the asymptotic distribution of T plus minus n into n plus 1 by 4 divided by square root of n into n plus 1 into 2 n plus 1 by 24 as asymptotically normal distribution. So, this is basically giving a method that for large sample size we can straight away apply a normal test for testing the equality of the median to 0. Suppose n is really large for example, let us take say some particular value. Suppose I take say n is equal to say 20. If I take n is equal to 20 then what will become n into n plus 1 by 4 that is becoming equal to 20 into 21 by 4 that is 105 and n into n plus 1 into 2 n plus 1 by 24 that will become equal to 20 into 21 into 41 divided by 24. So, this is 41 into 35 divided by 2. So, in this case T plus minus 105 divided by square root of 41 by 35 by 2 this will be approximated by normal 0 1. So, if we are considering z greater than see T plus greater than c alpha then it is equivalent to z greater than z alpha and that we take to be alpha. So, we can actually consider the value based on this. So, testing problem. So, we calculate suppose some data asset is given we calculate T plus for that for n is equal to 20 and then we compare with this value. Similarly, for the two sided testing problem we can directly use T itself. So, we will look at the T by sigma whether it is large or small corresponding to z alpha by 2. So, you can see here the concept of this Wilcoxon signed rank test is let me just look at the term here I will explain once again let me just. So, from the original observations here one assumption is there of course, that we are considering symmetry about the median that is if it is symmetric around some point that point becomes median and therefore, we are checking actually symmetry about the median and now we are testing whether the median is equal to a specific value without loss of generality we take that specific value to be 0. So, then the testing problem becomes whether the median is 0 or it is greater than 0 less than 0 or not equal to 0. For this this Wilcoxon signed rank statistic considers the magnitude of the x i's based on that we create the ranks of the absolute values and we look at those values which are positive. From the positive ones we look at the ranks of modulus x i among this. So, once that is done that so, there is a sum of the ranks of modulus x i this is called the Wilcoxon signed rank statistic. So, this can be used we have shown that the distribution of this can be calculated using a recursion relation which I gave that is in the terms of u n function here this u n u n is having a recursion here. So, one can calculate and of course, some tables of these are available, but even if we are not using the tables of that if the sample size is somewhat large then we can actually use this approximation because it is actually turning out to be the sum. So, t plus is written as a sum t is written as a sum and therefore, the distribution of this can be approximated by a normal distribution if the sample size is sufficiently large. Now, based on this the problem becomes quite simple now whenever we are having large data sets we straight away use the normal test based on the t r t plus therefore, it is very convenient to apply here. We will extend this concept further we are we will consider something called wall chevroges and we will consider this signed rank statistic in terms of that we will also define the general linear rank statistic see here you see we are considering sigma of u x i into r i plus. So, we are actually rank adding the ranks linearly and that means, in multiplying by u x i u x i can take value 1 and 0. We will consider a general function of this nature we look at how it can be used for constructing some other test. So, that we will be covering in the next lecture.