 So, we were discussing some two some test for the two sample scale problem. Let me consider now Socrates test. So, for this Socrates test let me define d i j is equal to 1 if y j is less than x i is less than 0 or 0 less than x i less than y j and it is equal to 0 otherwise. You can see that there is a little bit modification here. So, we are on one side of 0 like if y j is less than x i less than 0 or if y j is greater than x i greater than 0. In both the cases y j is further away than the 0 than x i from 0 than the x i. So, that is why you can see that this test is I mean it is different than the one with near this one because it is not simply based on the ordering of x i y j, but also that positioning from the 0 and we then define the statistic as double summation d i j i is equal to 1 to m j is equal to 1 to n. So, what will mean small t? Small t indicates more 0s among d i j s that means x's are more variable than y's and so theta will be greater than 1 and large t will indicate that is y's are more variable than x's that is theta is less than 1. So, you can see that this is a very very natural kind of definition that has been taken by Sukhatne. However, let us show the calculations for this. Let us consider say pi that is the probability of d i j is equal to 1 in terms of that we will actually derive the mean and variance etcetera of this statistic. So, this is equal to probability of y j less than x i less than 0 or 0 less than x i less than y j. So, that is equal to probability of y j less than x i less than 0 plus probability of 0 less than x i less than y j because these are two disjoint events. So, we can write it as the sum of the probabilities. Now, using the conditioning argument on x. So, we can express it like for example, if I consider y j less than x i less than 0. So, this we can consider as the conditioning on x i is y j less than x into the distribution of x, but x is up to 0 only. So, it will be from minus infinity to 0, but this can be written as simply g y x d f x. Similarly, if I consider probability of 0 less than x i less than y j, then this is equal to 0 to infinity probability of y j greater than x d f x. Then this becomes 1 minus the c d f of y ok. So, what we do? Let us consider under H naught. So, under H naught pi will become equal to minus infinity to 0. See this term plus this term I have written the expressions here f x d f x plus 0 to infinity 1 minus f x d f x. So, let us put say f x is equal to say u, then pi will become equal to at minus infinity this is 0 at 0 this will become half. So, this is becoming u du plus 0 to half well this is actually second one will become half to 1 and this is 1 minus u du. So, both are actually 1 by 8 plus 1 by 8 that is equal to 1 by 4. So, under the null hypothesis the probability that d i j is equal to 1 is actually becoming equal to 1 by 4. So, if I consider the expectation of t under the null hypothesis of course, this is equal to double summation d i j. So, that is equal to m n pi, but under the null hypothesis this is simply becoming m n by 4. So, you can see that actually the symmetry will come around this value. Let us look at similarly the variance term here. So, variance t is equal to we can write the general term covariance d i j d i prime j prime where the sum is over all i's and j's here. So, this is then equal to expectation of d i j d i prime j prime minus pi square i j i prime j prime. So, this value will be 1 only when d i j and d i prime j prime both are 1 in all other cases this will be equal to 0. So, this is simply equal to the probability of d i j is equal to 1 and d i prime j prime is equal to 1. So, we can express it as double summation well quadruple summation probability of d i j is equal to 1 d i prime j prime is equal to 1 minus pi square. Another thing that we observed since this d i j is based on x i and y j therefore, d i prime j prime will be based on x i prime y j prime since the random samples are taken therefore, this will be totally independent. So, this can be written separately here. So, that is equal to so, let me express it in full form here there will be all cases i can be equal to i prime j can be equal to j prime and so on. So, let us consider all the cases here. So, one case is when i is equal to i prime j is equal to j prime then this will become simply double summation. So, this term then can be written as if i and j i is equal to i prime j is equal to j prime then this term will become probability of d i j is equal to 1 only minus pi square. Now, let us consider other case one case will be when i is equal to i prime j is not equal to j prime. So, then in that case this will become triple summation that is i j j prime. So, this is then equal to probability of d i j is equal to 1 d i j prime is equal to 1 minus pi square. Now, this we have to calculate separately. So, let me give a notation for this will become pi 1. So, this is again pi here this is pi 1 then there will be another case the other case will be when i is not equal to i prime, but j is equal to j prime. So, this is i i prime j. So, this is probability of d i j is equal to 1 probability of d i prime j is equal to 1 minus pi square. This one let us name it as pi 2 and then there will be the choice when all of them are different that is i i prime j j prime here i is not equal to i prime j is not equal to j prime. This is probability d i j is equal to 1 d i prime j prime is equal to 1 minus pi square. In this case this is actually equal to d i j is equal to 1 d i prime j prime is equal to 1. Why? Because x i x i prime y j y j prime they are all independent. So, d i j will become independent of d i j prime d i prime j prime. So, then this is nothing but pi square. So, pi square minus pi square that is becoming equal to 0. So, this term vanishes we are left with this this and this term. So, in terms of the notations pi pi square pi 1 pi 2 etcetera we can express it as. So, then this we write as m n pi minus pi square plus m n into n minus 1 pi 1 minus pi square plus m n into m minus 1 pi 2 minus pi square. Let us look at the counting of these terms. Here we are taking over all i j. So, there will be m n terms. In the second one here I am taking i is equal to i prime, but j is not equal to j prime. So, these are n into n minus 1 and i's are m. So, m into n into n minus 1 pi 1 minus pi square. And then in the third one j is equal to j prime. So, that is n terms and then i is not equal to i prime that is m into m minus 1 term. So, it becomes m n into m minus 1 pi 2 minus pi square. And the last one which is actually m into m minus 1 n into n minus 1, but actually this is becoming 0 because this is pi square minus pi square. So, we are left with this much only. Now, let us consider the expressions for these quantities under general and null hypothesis. So, pi 1 let us look at for example. So, that is equal to d i j is equal to 1 d i j prime is equal to 1 where j is not equal to j prime. So, that is equal to probability of y j less than x i less than 0 or 0 less than x i less than y j that is d i j is equal to 1. And we will be taking intersection with the event y j prime less than x i less than 0 or 0 less than x i less than y j prime. So, then this is becoming equal to here you can notice here x i is fixed here. So, we can do the conditioning on that. So, this becomes probability of y j less than x less than 0 or 0 less than x less than y j intersection y j prime less than x less than 0 or 0 less than x less than y j prime d f x. So, that is equal to y j and y j prime are independent. Therefore, these two probabilities can be written as the product here y j less than x less than 0 0 less than x less than y j. In fact, you can write it as sum here into probability of y j prime less than x less than 0 plus probability of 0 less than x less than y j prime d f x. So, since j and j prime y j and y j prime have the same distribution therefore, this quantity will be same as this. So, we can write it as the square term here. So, that is equal to probability of y j less than x less than 0 square d f x minus infinity to infinity probability of 0 less than x less than y j square d f x plus twice minus infinity to infinity probability of y j less than x less than 0 into probability 0 less than x less than y j d f x. Now, you look at this one. See, this is saying y j less than x and of course, x is less than 0. This is y j greater than x. Now, under the same distribution of f that means, same distribution of x we are having two disjoint sets here. So, these two are disjoint sets. If these are two disjoint sets, therefore, this should be equal to 0 because this joint these two events cannot occur together. Like, if I have to put integral then for this one it is minus infinity to 0, for this one it has to be 0 to infinity. So, both of them cannot occur simultaneously. So, this term will become simply equal to 0. So, this one now it is equal to minus infinity to 0. This is the c d f of y square into d f x and the second one is then 0 to infinity 1 minus the c d f of x d f x. So, this is the general expression now we have obtained for pi 1. Now, under the special case when f and g are same then this become equal to minus infinity to 0 f x square d f x plus 0 to infinity 1 minus f x square d f x. So, when you put f x is equal to u then this can be written as say f x 0 to half u square d u plus 0 to half 1 minus f x you can put here. So, u square d u. So, this is then becoming equal to 1 by 12. This will become 1 by 3 here. So, u cube by 3. So, when you put 2 here 1 by 24 plus 1 by 24 is equal to 1 by. In a similar way if you look at the expression for pi 2. In pi 2 what is happening? The roles of x i and y j's you can interchange. So, I will not write the expression for that now in full detail. Pi 2 can be obtained from pi 1 by interchanging the role of x's and y's. So, under h naught the expression pi 2 will then become equal to 1 by 12. So, variance of t under h naught that is m n 1 by 4 3 by 4 that is pi minus pi square plus m n into n minus 1 1 by 12 minus 1 by 16 plus n m into n m minus 1 1 by 12 minus 1 by 16. Of course, you can simplify this. It becomes m n into m plus n plus 7 divided by 48. So, the null distribution of Socrates test statistic has been obtained here. The expectation t is m n by 4 and the variance of t under the null hypothesis is obtained. So, this Socrates test statistic can also be used for deriving the for testing the scale problem 2 sample scale problem. As I mentioned here that the small t indicates that theta is greater than 1, a large t will indicate theta less than 1. So, this can be used for the and also we have obtained the null distribution of that. As I mentioned now I am discussing the large sample property of the test for the nonparametric situations. So, this property is called the consistency of statistical tests. So, you can actually think of the consistency property of the estimator. In the point estimation, how do we define the consistency? We consider that the probability that the estimator approaches the true value of the parameter converges to 1. In the case of testing, we can consider the power function. If the power function approaches 1 that means the power becomes large and large as the sample size increases then we can consider it as a consistent test. So, it is similar to the consistency of the estimator in the sense that here the power will increase here. So, let me define here. Let t n be a level alpha test based on test statistic based on n observations for testing H naught say g belongs to omega null versus H 1 g belongs to omega alternative which is actually equal to omega minus omega null. This is my testing problem here. Then the test based on t n is consistent if probability say g belongs to omega alternative rejecting H naught. This is goes to 1 as n tends to infinity. So, it is same as the power of the test going to 1. Let me consider a simple application of this. Let us consider say observations from a normal distribution with mean theta and variance unity and we are considering the standard test for the hypothesis testing problem theta is equal to 0 against theta greater than 0. So, consider the most powerful that is we call it UMP test here uniformly most powerful test that is reject H naught if root n x n bar is greater than z alpha that is a level alpha test here. So, we consider at a point theta 1. So, let us take say theta 1 greater than 0. What is the power at this point? Root n x bar greater than z alpha that is equal to probability of root n x bar minus theta 1 greater than z alpha minus root n theta 1. When theta is equal to theta 1 this will have the standard normal distribution. So, this is probability of z greater than z alpha minus root n theta 1. Now, as n tends to infinity what happens here? Here theta 1 is positive. Therefore, this value will go to minus infinity. So, z greater than minus infinity this will go to 1 as n tends to infinity. So, root n x n bar is a consistent test statistic and this test is actually consistent that is root n x n bar greater than z alpha this is consistent test region that is consistent critical region here. In the non-parametric situation directly specifying this kind of thing is difficult here. So, because we do not have the knowledge of the probability distribution here. So, we cannot write down this kind of statement. So, we define in a different way in non-parametric situations this type of testing is difficult since we do not have any knowledge of the distribution of x's except that it is continuous. Let us consider suppose the test statistic v n based on a sample of size n satisfies v n converges to mu of g in probability as n tends to infinity. And the function mu satisfies mu g is equal to mu naught if g belongs to omega null and it is greater than mu naught if g belongs to omega alternative. Let me give this numbering here. So, we have the following result then regarding the consistency here. Suppose v n is a test statistic for the situation 1, 1 is the hypothesis testing problem g belongs to omega null against h 1 g belongs to omega alternative. So, suppose v n is a test statistic for testing problem 1 and rejects h naught for large values and satisfies 2 and 3. So, that means it is consistent that is convergence in probability to mu g function and this mu g itself satisfies that under the null hypothesis it is equal to some fixed value mu naught and under the alternative hypothesis it is greater than mu naught. So, basically we are trying to put it in the framework of a parametric testing problem here. Suppose further that there is a constant sigma naught such that root n v n minus mu naught by sigma naught converges in distribution to standard normal distribution for all that means under the null hypothesis. Then there exists a sequence of critical values there exists a sequence of critical values k n such that v n is asymptotically of size alpha and probability of v n greater than or equal to k n under g is going to 1 as n tends to infinity for all g in the alternative. Asymptotically size alpha we let me define here. Test based on v n is called asymptotically size alpha if k n are such that alpha n is equal to probability v n greater than or equal to k n goes to alpha as n tends to infinity for all g belonging to omega null. Let me prove this. So, let z alpha be the upper 100 alpha percent point of the standard normal distribution. So, let us define say k n is equal to mu naught plus z alpha sigma naught by root n. So, let us consider say alpha n is equal to probability of v n greater than or equal to k n. So, that is equal to probability of root n v n minus mu naught by sigma naught greater than or equal to z alpha. So, alpha n goes to alpha by 4. We have assumed here the asymptotic normality here and v n is asymptotically size alpha. Now, we fix here g star belonging to omega c and define epsilon is equal to z alpha. So, this is equal to mu g star minus mu naught divided by 2. So, by 3 this epsilon will be greater than 0 and for sufficiently large n, k n is less than mu naught plus epsilon. Since k n goes to mu naught from 6. So, this is equal to 0. From 7 we will have mu naught is equal to mu of g star minus 2 epsilon. Hence, k n less than mu of g star minus epsilon. So, if we consider now modulus of v n minus mu g star less than epsilon, this will imply that v n minus mu g star is greater than minus epsilon, which implies that v n is greater than mu g star minus epsilon, which implies that v n is greater than or equal to k n from 8 because of this condition here. So, this implies that probability of modulus v n minus mu of g star less than epsilon for the distribution g star, it is less than or equal to probability of v n greater than or equal to k n under the distribution g star that is less than or equal to 1. Now, by the equation number 2 that we have taken here that is v n goes to mu g as n tends to infinity. So, therefore, the left hand side goes to converges to 1. Hence, probability of v n greater than or equal to k n goes to 1. Since g star was arbitrarily fixed in omega c, the theorem is proved. We will prove the consistency of some standard test here. Let us consider say consistency of the sign test here. The sign test that we had introduced firstly for testing whether the median is equal to 0 or greater than 0 or less than 0. So, let us consider here k is the sign test statistic. So, let us define say k bar is equal to k by n. So, probability of k bar minus 1 minus f of minus theta greater than epsilon. This is less than or equal to by Shevy Shev's inequality 1 minus f of minus theta into f of minus theta divided by n and this goes to 0 as n tends to infinity. So, k bar it converges to 1 minus f minus theta in probability that is mu of f theta. Mu of f theta is equal to half and it is greater than half for theta greater than 0 and the sign test separates the null hypothesis f belonging to omega naught theta is equal to 0 from the alternative f belonging to omega naught theta is greater than 0. The consistency set for the sign test is the class of absolutely continuous distributions with unique positive median and the required asymptotic normality will follow from the fact that k minus expectation k divided by square root of variance k has asymptotic standard normal distribution with mu naught is equal to half and sigma naught is equal to half. Any reasonable test should be actually consistent. Therefore, actually consistency does not provide a criteria for distinguishing among test. However, if a test is not consistent then certainly it is a defective test. So, that means basically all the good test must be consistent test. So, let me just give it as a remark here certainly consistency is a desirable property for any test and so a test which is not consistent must be outright rejected. Let me consider a test which is not consistent, which is not consistent. So, let us consider say x 1, x 2, x n. Let x 1, x 2, x n be a random sample from a Cauchy distribution that is with pdf. Suppose I am considering the location form 1 by pi 1 plus x minus theta square. If we consider here actually we know the characteristic function. The characteristic function of Cauchy distribution that is phi t is equal to e to the power minus modulus t plus i times theta t. So, suppose for testing h naught theta is equal to 0 against alternative theta greater than 0, we reject if x bar is 0. Greater than or equal to c. So, if we consider this characteristic function of x bar then it is same as phi of t by n to the power n that is equal to e to the power same thing basically because of this form it will turn out to be phi t itself. So, what we are concluding here is that. So, x bar has exactly the same distribution as x i that is it is independent of n. So, power function of x bar does not depend upon n. So, power cannot converge to one. As n tends to infinity. So, x bar does not give a consistent test. So, this is an example of a bad test. I have proved the consistency of the sin test. Let us also consider the consistency of the Wilcoxon test say. So, omega is the class of symmetric continuous distribution with median given by n and omega naught is the class where we say that median is equal to 0. So, we consider the following subsets of for alternative following subsets of omega minus omega naught. So, these are for defining the alternative hypothesis here. Let me call it gamma 1 that is f belongs to omega where g f is greater than half. So, basically here g f is probability of x 1 plus x 2 greater than 0 under null hypothesis this is equal to half omega 2 is the where g f is less than half and gamma 3 f belonging to omega greater g f is not equal to half. Let us consider here S n that is 2 t plus by n square. So, if I consider expectation of S n that is equal to 2 by n square into n into n plus 1 by 4. Naturally this will converge to half as n tends to infinity. Similarly, if I consider variance of S n then that is 4 by n to the power 4 variance of t plus that is n into n plus 1 into 2 n plus 1 by 24 that goes to 0 as n tends to infinity this goes to half. So, if I consider probability of S n minus half modulus being greater than epsilon then by Shevy Shev's inequality it is less than or equal to expectation of S n minus half square divided by epsilon square and this we simply split into 2 parts that is equal to 1 by epsilon square variance of S n plus 1 by 2 n whole square. So, this goes to 0 as n tends to infinity. So, what we have proved here that S n then converges to half in probability as n tends to infinity under H naught and the asymptotic distribution of S n is also normal. Hence the following tests will be consistent that is reject H naught if S n minus half is greater than C n for f belonging to gamma 1. Reject H naught if S n minus half is less than some C n star for f belonging to gamma 2 and thirdly reject H naught if modulus of S n minus half is greater than say C n double star for f belonging to gamma 3. All of these 3 test statistics will be consistent. Since this Man-Witney test was simply a variation here from the Wilcoxon let us prove the consistency of Man-Witney also. So, consistency of S n I will just like to explain once again here the consistency of the Wilcoxon sin rank statistic in the null hypothesis we are saying the median is 0. So, here we are saying the on either side that the median is greater than 0 less than 0 or not equal to 0. So, to prove this what we considered is that consistency and the asymptotic normality then for the one sided alternative that is theta greater than 0 when we are having the right hand side as the rejection region then this is a consistent test. For the m less than 0 when we have the alternative then the left hand rejection region is a consistent and the two sided rejection region will be consistent when we have the two sided alternative hypothesis here. Now let us consider consistency of Man-Witney test statistic. So, we define omega is equal to the class of all two sample problems. So, f and g are continuous distribution functions and gx is equal to f of x minus m. So, omega naught is the case when m is equal to 0 that means we are considering f g belonging to omega such that f and g are the same. Let us define g of f g that is equal to probability of y less than x. So, g of f f that is equal to f of x into df of x that is equal to integral du 0 to 1 that is equal to half that we call it is equal to g naught. So, we define the alternative hypothesis sets as f g such that g of f g is greater than half less than half gamma 2 or gamma 3. Let us consider here S mn that is equal to u mn divided by mn this u mn was the Man-Witney statistic. So, we are considering scaling by mn here. So, expectation of S mn that is equal to mn by 2 into 1 by mn that is simply equal to half that is equal to g naught. Let us consider variance of S mn that is equal to 1 by m square n square mn into m plus n plus 1 divided by 12 this goes to 0 as minimum of mn goes to infinity because here one of the mn will cancel out and the term will become 1 by m plus 1 by n plus 1 by mn. So, if minimum of mn goes to 0 then both of the terms will go to 0 and if I consider then S mn minus half probability of this greater than epsilon then this is less than or equal to well again we can show that this is less than or equal to variance of S mn by epsilon square this goes to 0. So, we are concluding that S mn goes to half in probability as minimum of mn goes to infinity under h naught and also the asymptotic distribution is established under h naught this goes to z following normal 0. Since these two properties are satisfied we conclude that the one with me test statistics will be consistent provided we define it in the following fashion. We have the three alternative hypothesis one is when we are considering g f g greater than half. So, we consider the right handed rejection region here we consider the left handed rejection region here we consider the two sided rejection region here. So, let me define it here therefore consistent tests based on one with me statistic are given by reject h naught if S mn minus half is greater than C mn for f g belonging to gamma 1 reject h naught if S mn minus half is less than C mn star for f g belonging to gamma 2 and thirdly reject h naught if modulus of S mn minus half is greater than C mn double star for f g belonging to gamma 3. So, all of these test functions are actually consistent tests here. Here we have considered two types of two sample problems in one of the two sample problems we are shifting by a location and in another one we are shifting by a scale. So, we want to know whether the shifting is actually significant or not that means like if we are shifting by the location then we are saying whether that shifting is on the positive direction or it is in the negative direction. Similarly, in the scale we are considering greater than one or less than one that means whether we are introducing more variability or we are considering less variability. One may also think of a general two sample problem in which we do not talk about the location scale rather we consider whether the two distributions are the same or not. It is something like we consider in the one sample problem that we test whether the given distribution function is of a given form. So, we have for example, a chi square test for goodness of it. We also introduce the Kalmogorov-ismirno test for the single sample problem. So, in a similar way if we consider a more general form of the hypothesis for a two sample problem that means we simply say whether the two distributions are the same or not then it becomes a you can consider it as a goodness of it problem and we can consider a Kalmogorov-ismirno two sample test for this. So, in the last lecture I will be actually discussing about the Kalmogorov-ismirno test and we will discuss the concept of the efficiency of the test also. So, in the next lecture we will take up this part here.