 In the previous lecture I had defined two types of statistics let me recall those things. So, we are considering that a random sample x 1, x 2, x n is taken from a CDF f x and y 1, y 2, y n is a random sample from a CDF g y and we also assume that the two samples are taken independently. Then based on the empirical distribution function of first sample we define u i is equal to f m of y i and similarly we define u bracket i is equal to f m of y of bracketed i. Here again f m is the empirical distribution function of the first sample. So, this is the number of x j is less than or equal to y i and of course, divided by m and here it is the number of x j is less than or equal to y i divided by m and both u i and u bracket i they take values 0, 1 by m, 2 by m up to m minus 1 by m 1. And we considered the forms of the distributions of u i and the distribution of u bracketed i. We also considered the special case when f is equal to g. Now let us and we also considered the joint distributions of u i, u j, u bracket i, u j. We also looked at the moment structure of these quantities. Let me now proceed from here. So, first thing is that we considered the joint distribution of up and u q. Now I look at the distribution of the difference. So, we have the following result. We will now show that the distribution of u q minus u p is the same as the distribution of u q minus p. So, for that let us consider the distribution of u p and u q. So, u q minus u p is equal to say something like k by m. Since the values taken by these are of the form p by m. So, we can consider like this the difference is that k by m type only. That means, one of them is say i by m and the other one is of the form i plus k by m where i is varying from 0 to m minus k. That is equal to now here I write down the joint distribution of this which we derived in the previous lecture. Let me recollect that thing. The joint distribution of u p and u q. So, the values taken are j by m and l by m then it is given by this expression j plus p minus 1 c j and all these things. So, here for j I will substitute i and for l I will substitute i plus k. So, when we do that we will get the expression as i plus p minus 1 c i m plus n minus i minus k minus q c m minus i minus k i plus k minus i plus q minus p minus 1 c i plus k minus i. i is equal to 0 to m minus k divided by m plus c. So, this term you can see it becomes free from i. So, we can take it out. So, that is becoming q plus k minus p minus 1 c k divided by m plus n c n. So, this second one this is i plus p minus 1 c i then this is m plus n minus i minus k minus q c m minus i minus k. On this we apply this formula. Let me write that formula here. We have a plus k minus j minus 1 c k minus j. j is equal to 0 to k into b plus j minus 1 c j that is equal to a plus b plus k minus 1 c k. So, this becomes simply equal to then q plus k minus p minus 1 c k m plus n minus k minus q plus p c m minus k divided by m plus n c sorry m plus n c m, but if you see this is also the probability of u q minus p is equal to k. We can look at the distribution that I derived for u i here that is equal to j by m is equal to m plus n minus i minus j c m minus j i plus j minus 1 c j. So, here if I put i is equal to q minus p and j is equal to k then I get exactly this quantity. So, this proves that the distributions of u q minus up and u q minus p is same that means, it is dependent upon the difference. Now, let us consider the moments of u i u j. We derive the moments of u i and u j of course, we consider the case when f is equal to g. When f is not equal to g then only expressions can be written, but here we can derive the exact values. So, expectation of u i that is expectation of f m of y i we can consider it as expectation of expectation f m y i given y i ok. So, what we are doing here that here the sample the first sample and the second sample both are involved here. So, expectation means expectation with respect to both the distributions of f and g. So, we do it iteratively firstly condition on y i. So, when we condition on y i then it will become the expectation with respect to the x sample and after that we will do the second one. Now, if y i is fixed then this is the empirical distribution function and that we know it is unbiased for the population cdf. Now, this quantity is nothing, but expectation of u i that is the u i of the not this u i. This u i is actually the one which we derived as the ith order statistics from a uniform distribution. So, this quantity ok let me call it u i star that is the ith order statistics from uniform 0 1. This we have already seen it is equal to i by n plus 1 because for the order statistics from the uniform distribution we have seen that the it is a beta distribution with parameters i minus 1 with parameters i and n minus i plus 1. So, the mean will become i divided by i plus n minus i plus 1 that is n plus 1. So, it is equal to i divided by n plus 1. Similarly, if I consider variance of u i we can consider it as variance of expectation of f m y i given y i plus expectation of variance f m y i given y i. Once again this a inner term will become f of y i and this term as we can see the distribution of f m y f m x that we have seen it is binomial distribution. We have derived it in the previous class let me just show the result once again that we had obtained the distribution of m f m x as a binomial distribution and from here the variance of f m x was f x into 1 minus f x by m. So, if we use this then this is becoming equal to f of y i into 1 minus f of y i divided by m. Now, this term is turning out to be the variance of the i th order statistics from the uniform distribution. Now, variance in a beta distribution the formula is known. So, it is becoming i into n minus i plus 1 divided by n plus 1 square into n plus 2. In the second case so, 1 by m we can keep outside and this is becoming expectation of f of y i that is i by n plus 1 and this one will come in the second moment that is i into i plus 1 divided by n plus 1 into n plus 2. So, anyway this terms can be simplified and we get it as equal to i into n minus i plus 1 divided by n plus 1 square n plus 2 m plus n plus 1 divided by m. So, we are able to derive the mean and the variance of u i of course, under the condition that when the two populations are the same. In a similar way we can try for the covariance term also. Let us look at covariance between u i and u j. Of course, here I am taking i naught equal to j. So, without loss of generality let us take say i is less than j. So, that is covariance of f m of y i f m of y j. So, that is equal to once again we can write it as covariance of expectation plus expectation of covariance. So, this one will become given y i and this one will become expectation of f m of y j given y j plus expectation of covariance between f m y i f m y j this is given y i y j. So, this is equal to covariance between. So, you look at the first term here expectation of f m y i given y i then this will become f of y i second term will become f of y j and the next one is the. Now, this type of term also we have seen because if I look at two of them then this is becoming f m of f of y i into 1 minus f of y j. So, it will become expectation of this divided by n ok. So, now these are the order statistics i th and j th from the uniform distribution. So, the formula for the covariance we have derived earlier i into n minus j plus 1 n plus 1 square into n plus 2 1 by m i divided by n plus 1 minus i into j plus 1 divided by n plus 1 into n plus 2. So, after simplification this turns out to be i into n minus j plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 divided by n. When let us also look at the coefficient of correlation between u i and u j then that is equal to covariance divided by the square root of the variances. So, it is becoming i n minus j plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 by m divided by. So, you can easily see that these terms will get cancelled out and we are left with here simply i into n minus j plus 1 divided by j into n minus i plus 1. So, we are able to completely determine the moment structure of the distributions of u i and also joint distributions of u i, u j and we have seen without the bracket and with the bracket also. Now, we will see the applications of this in certain two sample testing problems some applications of these terms. So, let us go to our original assumption that we are havingindependent random samples, independent random samples x 1, x 2, x m from say f x and y 1, y 2, y n from say g y respectively. Let us assume that let xi denote the median of f and say eta denote the median of g. So, for the two distributions I am considering the medians. Like in the classical parametric inference problems, we assume the means to be mu 1, mu 2 and variances to be sigma 1 square sigma 2 square. Then our problem of interest is to test whether mu 1 is equal to mu 2 or sigma 1 square is equal to sigma 2 square etcetera. So, similarly when we are considering the nonparametric situation, we would be interested in testing whether xi is equal to eta or xi is less than eta or xi greater than eta etcetera. So, we can consider this hypothesis problem xi is equal to eta against say xi less than eta. So, it could be also xi greater than eta or xi not equal to eta etcetera. All these type of testing problems can be considered. Let me call it h 2 and this as h 3. One of the first tests is actually called Matheson median test. In this test what we do? We define t 1 to be the number of x's which are less than or equal to median of y's. So, actually depending upon what is the number of observations in y second sample. So, n could be odd n could be even. So, n is odd then you will have a unique median. So, how many x's are less than that? That is exactly given by this u y term. So, we can consider it as m times that is u n plus 1 by 2 that is same as f of f n y n plus 1 by 2 if n is odd and it is equal to y n by 2 plus y n by 2 plus 1 divided by 2 if n is even. Now in the case of odd actually the distribution of this has already been worked out. We already know it is mean and variance under the null distribution. Let me write that here. When n is odd let us find the null mean and variance of t. So, we consider expectation of t 1 under h naught that is expectation of t 1 when h naught is true that means when f is equal to g. When f is equal to g that means we can consider this equal to given h naught that is equal to m times n plus 1 by 2 divided by n plus 1 that is equal to m by 2 and variance of t 1 of course it is equal to m square n plus 1 by 2 n minus n plus 1 by 2 plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 by n. So, that is equal to m into m plus n plus 1 divided by 4 into n plus 2. So, this is the mean and this is the variance under the null hypothesis. So, certainly we can consider the normalization actually and of course, I know the distribution of t 1 also here. So, we can actually check whether it is too large or too small. We can also apply this name and person type of thing that means we can consider the probability of type 1 error is equal to alpha and then we find the value of the critical point. So, we can consider say we reject h naught in favor of h 1 if t 1 is too large and we can also define reverse of this that is in favor of h 2 if t 1 is too small and in favor of h 3 if t 1 is either too large or too small. See we can actually consider t 2 as the number of excess greater than or equal to y n that is m minus number of excess which are less than or equal to y n that is equal to m minus m f m of y n that is equal to m times 1 minus u n that is equal to m by n plus 1 variance of t 2 is equal to m square n n minus n plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 by n that is equal to m n into m plus n plus 1 divided by n plus 1 square into n plus 2. So, one can use this also this t 2 is called this is called a Rosenbaum statistic 1. So, we can actually do the testing based on this also for example, here it is greater. So, if it is in the reverse way you can consider that if t 2 is a small then we reject h naught in favor of h 1. So, you can see it is a reverse of this t 1 is too large and here it is t 2 is a small and reverse will happen against h 2 if t 2 is large and in favor of h 3 if t 2 is too large or too small. The drawback with this 2 statistic that I defined that is Matheson median test is that this is based on median only and in the Rosenbaum this is based on only the largest one. Now, one can think of using all of them then that is called Mann-Whitney statistic Mann-Whitney u statistic that is based on the summation of all such terms that is number of x's less than or equal to y j and you sum from j is equal to 1 to n then that is equal to summation j is equal to 1 to n. Actually here if I put this or if I put this they are same number of x's less than or equal to y j because when I am summing over all j's. So, then whether it is ordered or unordered both are same. So, both of them I can write like this. Now the advantage of this term is that it is simply f times sigma of u j and if I consider this form then it is m times sigma f m of y j that is equal to m times sigma of u j. So, this is now known as the Mann-Whitney u statistic we can consider the mean and variance under the null hypothesis then it is simply equal to m times j is equal to 1 to n we know that this was equal to half. So, it is m n by 2 and if you look at the variance term variance under the null hypothesis then it is equal to m square summation variance of u j plus double summation covariance of u j u k j is not equal to k. So, that is equal to m square and this is n and we actually derived these expressions in the previous lecture let me just recall those expressions here. So, you can see here the expression for this was half and the expectation of this was derived variance was derived as 2 plus m that is m plus 2 by 12 m. So, these expressions were derived and also the covariance term was derived here the covariance term of this was equal to 1 by 12 m. So, we substitute all these terms here m plus 2 by 12 m plus n into n minus 1 1 by 12. So, we can simplify these terms easily this is equal to m n m plus n plus 1 divided by 12. Once again if there are more number of x's which are less than or equal to y j's then the median of x's will be greater than the median of y's. So, for large value of u we will be rejecting h naught in favor of h 1 and similarly for the other hypothesis. So, in a similar way we have a another one which is called again Rosenbaum I had defined 1 that was t 2 here now I am defining t 3 this I am defining as the number of x's less than or equal to y 1 plus the number of x's greater than y n. This one you can see it is more useful for the range that means, if we are checking the variability or the scale parameter then this will be more useful. For example, if there are more x's which are outside of y 1 and y n then certainly it means that the variability of x will be more than the variability of y's ok. So, this can be written as m times u 1 plus n times 1 minus u n 1 plus u 1 minus u n of course, under this is equal to m 1 plus 1 by n plus 1 minus n by n plus 1 that is equal to 2 m divided by n plus 1. The variability of this can be calculated that will become m square plus variance into variance of this plus variance of this and covariance term here. So, this is becoming m square variance of u 1 plus variance of u n minus twice covariance of u 1 u n. So, we can substitute the expressions for this here and we will get m square n minus 1 plus 1 since it is the first one. So, 1 into that divided by n plus 1 square into n plus 2 m plus n plus 1 by n plus n into n minus n plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 by n minus 2 into 1 into n minus n plus 1 divided by n plus 1 square into n plus 2 m plus n plus 1 by n. So, this term is common and then you have n plus 1 square by n plus 2 that is also common. So, you get m into m plus n plus 1 and after simplification this term becomes 2 n minus 2. So, 2 times n minus 1 divided by n plus 1 square into n plus 2. So, once again we are able to obtain the null mean and variance of this thing and for testing purpose this is more as I mentioned it is more useful for the scale. So, if we get a large value of T 3 it means that the range of first sample is more than the range of the second sample. If T 3 is too small then it means that the range of the second distribution is more than the range of the first distribution. So, this can also be used for the testing for the range. In general we can define linear rank statistics, 2 samples of sizes m and n are there. Let us consider the composite sample size is equal to capital N and let us define h n x is the sample distribution function or empirical distribution function based on the combined sample of x's and y's. That means I consider all the observations together and then I look at the order statistics of them not separately. It is not that I write x 1, x 2, x m first and then y 1, y 2, y n. I merge the 2 and then I consider the full ordering and from that I define the empirical distribution. That means if I consider n times h n of y j then it is the number of x's and y's which are less than or equal to say y j. So, if I consider n times h n of y j minus j then it is equal to number of x's less than or equal to y j that is equal to m times u j which is the one defined earlier that was m times f m of y j. So, I have established a relationship between the empirical distribution of the first sample with the empirical distribution function of the combined sample in terms of the value of y j. So, in general we say that any linear rank statistics is a function of h n x. In other words it is also a function of f m y j. Next we define prediction intervals. As before we have the two samples that is x 1, x 2, x m is a random sample from f x, y 1, y 2, y n is a random sample from g y and we assume that these are independently taken. Let us consider say g be a function of y 1, y 2, y n and l and u be functions of x 1, x 2, x n. Then if we have the statement like that means probability of g y lying between two functions of x if this is equal to 1 minus gamma then we say that l u is 101 minus gamma percent prediction interval per g. Now, let us consider the interpretation of this. See we have already seen the confidence intervals. In the confidence intervals the parametric term is used to be fixed. So, we find out the probability of two statistics including that parametric value equal to 1 minus alpha. So, that is called the 101 minus alpha percent confidence interval. Now, this there is a difference in the terminology. So, in general it could be that we may be using see there may be some relationship between the two distributions. So, we may be using the relationship to predict the value of a function of second sample based on the values of the first sample. So, this is basically based on the relationship that is available between the two. Let us take the special case let f be equal to g then how do we look at it. So, let us consider say l to be the 1 th order statistics and u to be say r 2 th order statistics. So, we consider say prediction interval for at least k of y 1, y 2, y n that is to find r 1 and r 2 such that probability of at least k of y 1, y 2, y n are between x r 1 and x r 2 that is equal to 1 minus gamma. At least k of y 1, y 2, y n are between x r 1 and x r 2 and I want this probability to be 1 minus gamma. So, we want prediction interval for this. So, this is something like a location problem, but this is more general kind of location problem. See if we consider parametric inference then we consider say if I have made in two samples then we can consider theta 1 minus theta 2. So, a confidence interval for that we can consider confidence interval for linear combination of theta 1, theta 2. We can also consider linear say like sigma 1 by sigma 2 that means some sort of parametric function. When we are not having that then we are considering order statistics here and on the basis of that we are talking about some sort of location problem. So, let us consider then n times f n x that is equal to number of y is less than or equal to x. Then we can consider n times f n of x r 1 then that is equal to number of y is less than or equal to x r 1. Then we can also consider n times f n of x r 2 that is equal to number of y is less than or equal to x r 2. So, here these here f n is now denoting the empirical distribution function based on the second sample rather than the first sample. So, n times f n x r 2 minus n times f n x r 1 that is the number of y is between x r 1 and x r 2. So, we want to find r 1 and r 2 such that probability of n times f n x r 2 minus n times f n x r 1 that is greater than or equal to k that is equal to 1 minus gamma. Now, the expressions for this is it is simply u r 2 minus u r 1 greater than or equal to n times. So, I can divide here that is equal to 1 minus gamma. We have seen that the distribution of the difference is same as the u of u q minus u p has the same distribution as u q minus p. So, that is equal to 1 minus gamma here. So, this is nothing but sigma probability of u r 2 minus r 1 is equal to some i by n where i is equal to k to n that is equal to 1 minus gamma. The distribution of this is known to us. So, we substitute this value it is turning out to be simply equal to i is equal to k to n m plus n minus r 2 plus r 1 minus i c n minus i r 2 minus r 1 plus i minus 1 c i divided by m plus n c n that is equal to 1 minus gamma. So, from the tables of the factorial or hypergeometric distribution then this can be calculated here. So, basically this is hypergeometric here this is m plus n c m that is also same as m plus n c n. So, both are same therefore, this is a proper hypergeometric term here. In a similar way I can consider prediction interval i th order statistics from the second sample say. That means, we want now r 1 and r 2 such that probability of x r 1 less than or equal to y i less than or equal to x r 2 is equal to 1 minus gamma. Let us take here empirical distribution function then this is nothing but r 1 by m less than or equal to u i less than or equal to r 2 by n that is equal to 1 minus gamma. The distribution of this is known. So, it is simply reducing to simply m plus n minus i minus j m minus j i plus j minus 1 c j divided by m plus n c m from r 1 to r 2. Similarly, we can consider prediction interval for at least j minus 1 j minus i plus 1 of y's. So, we want r 1 and r 2 such that probability of the interval x r 1 to x r 2 contains at least j minus i plus 1 of y's that is equal to 1 minus gamma. That is probability that x r 1 less than or equal to y i less than or equal to y j less than or equal to x r 2 it is equal to 1 minus gamma. So, f m of x r 1 less than or equal to f m of y i less than or equal to f m of y j less than or equal to f m of x r 2 where f m is denoting the empirical distribution function of x sample. So, this is nothing but r 1 by m less than or equal to u i less than or equal to u j less than or equal to r 2 by n that is equal to 1 minus gamma. So, we can consider it as a double summation probability of u i is equal to some k by m and u j is equal to some t by m where t is equal to say r 1 to r 2 and k is equal to r 1 to t that is equal to 1 minus gamma. The joint distribution of u i and u j has been obtained. So, it is nothing but k plus i minus 1 c k m plus n minus t minus j c m minus t t minus k plus j minus i minus 1 c t minus k divided by m plus n c m t is equal to r 1 to r 2 k is equal to r 1 to t that is equal to 1 minus gamma. So, this is the bivariate hypergeometric type of term here and once again from the tables of factorials or the tables of bivariate hypergeometric this can be evaluated. I have given here some elementary applications of this statistics which are based on the empirical distribution function. So, we have the let me summarize I have given here several applications here one is to find out the tests for the comparison of the medians. I also mentioned that one of them can be used for the scales also that means, for the range. We have also defined what is known as the prediction interval. So, when we have the 2 samples then what is the use of the prediction interval that means, based on 2 values of the first sample or order statistics from the first sample I can predict something about the value of the random variable. So, basically we are talking about the probability that with this much probability we can say that this value will lie in this interval. So, in this one I discussed several of them for example, what is the prediction interval for y is i is what is the prediction interval for at least j minus i plus 1 of y is what is the prediction interval for at least k of y 1 y 2 y n etcetera. So, these different type of formulae they are useful for various type of nonparametric location or scale parameter problems. We will talk more about this in the next lecture. Now there is another problem that is very commonly used by all the people working in different areas of science and engineering that is given a data how to decide that which particular distributional model will be useful. One of the most popular applications or test for this is known as the chi square test for goodness of fit which is originally given by Karl Pearson. So, I will talk about that later on another powerful test was given by Karl Mogoro and Nismirno. So, in the next lecture I will be discussing both these tests for fitting of distribution. So, they are called goodness of fit test. And as you can see that the structure of this test is extremely simple they are not dependent upon what is the original distributional model only what assumption we are making it is dependent upon that. So, in the next lecture we will be discussing these.