 as I mentioned that in place of considering location alternative or the scale alternative, one may consider general alternative that means, we simply say that the two distributions are not the same. So, now an analogous situation you can think of in the case of parametric or non parametric situation is that we may think of not specifying the distribution and then talking about the location parameter or the scale parameter rather we consider simply the specification of the distribution itself. Now, in the one sample problem one solution was given by the chi square test for goodness of it and another solution is given by the Kalmogorov-Smirnov test statistic. Now, let us define the general two sample problem and discuss the analogous test here. So, the general two sample problem is. So, let x 1, x 2, x m be a random sample from a continuous distribution f x and say y 1, y 2, y n be another independent random sample from a continuous distribution g y. So, we want to test the hypothesis whether g y x is equal to f x x for all x against the alternative g y x is not equal to f x for some x. For this problem one of the you can say most plausible solutions is the Kalmogorov-Smirnov two sample test statistic. So, let us define firstly the empirical distribution functions based on x 1, x 2, x m and y 1, y 2, y n. So, let f m x and g n x be empirical distribution functions based on x sample and y sample respectively. So, we are defining like f m x is equal to 0 if x is less than x 1, it is equal to i by m if x i is less than or equal to x less than x i plus 1, for i is equal to 1 to m minus 1, it is equal to 1 if x is greater than x m. Similarly, g n x is equal to 0 if x is less than y 1, it is equal to i by n if y i is less than or equal to x less than y i plus 1 and it is equal to 1 if x is greater than or equal to y n. Then d m n that is equal to maximum of f m x minus g n x. If you remember in the case of one sample problem it was defined as f n x minus f x. Here now I am considering the difference between the distribution, the two distributions basically. So, this is actually the Kalmogorov-Smirnov two sample test statistic for testing H naught. Now, naturally if the two distributions are close then naturally their sampled values will be closer to each other and therefore there will be regions where f m and g n will be similar. So, therefore the differences will be smaller otherwise the differences will be large. So, we can simply say that large values of d m n indicate that H naught is false. So, d m n is actually distribution free. We will actually discuss the method how to determine the probability in the probability of type 1 error type 2 error based on this thing. So, this is distribution free. Let us discuss this methods now. So, one method that has been discussed it is actually by Hodge's method. Hodge's method for determining probability of d m n greater than or equal to d under H naught. So, first thing is that we consider the combined sample consider the combined sample of x and y values and then arrange it in increasing order that means we rank them of magnitude. So, we consider a Cartesian system in the following fashion depict on a Cartesian system of coordinates by a path which starts at the origin moves one step right for n x and one step up for y. So, it is like this we if x is next then we come like this otherwise we go like this. So, it could be like this and so on. So, observed values of m f m x and n g n x they are actually the all the coordinates of the points of u v on this path where u and v are integers. Observed values of m f m x and n g n x they are respectively the coordinates all points u v on the path where u and v are integers. So, d is then equal to maximum of basically u by m minus v by n which we can write as 1 by n v minus n by m u we can express it as 1 by n into the vertical distance from the line n x minus m y is equal to 0 to the point on the path which is for this. So, m n is say number of arrangements that is paths of the sample which lead to d m n less than d then you will have probability of d m n greater than or equal to d under h naught that is equal to 1 minus a m n divided by m plus n c m because m plus n c m will be all the paths I mean we can have the extreme cases when all the x i's are below all the y j's or one could be in which the all of them are like 1 x 1 y like that all it could be like that also. So, a m n is the arrangement of the paths of the sample which leads to d m n less than d then it is the probability that d m n greater than or equal to d is 1 minus a m n by m plus n c m. So, to count a m n we have to draw two lines at a vertical distance n d from the diagonal and count the paths which lie entirely within these lines by use of the recurrence relation that is a u v is actually equal to a times u minus 1 v plus a of u v minus 1 and of course, a u 0 that will be equal to a 0 v that is equal to 1. Let us also consider this if m n tend to infinity such that m by n goes to a constant then probability that d m n root m n by n plus n is less than or equal to z it converges to 1 minus twice minus 1 to the power i minus 1 e to the power minus 2 i square z square i is equal to 1 to infinity. So, asymptotic distribution is obtained here for testing h naught versus h 1 where h 1 could be like a stochastic ordering that is say g y x is less than or equal to f x basically it means y less than or equal to x is less than or equal to probability x less than or equal to x that means basically what we are saying is that x is stochastically larger is smaller than the x is stochastically smaller than y and for all x and say g y x is less than f x for some x. So, one uses a version of this let us call it d m n plus that is maximum of that means, we consider only one sided maximum of f m x minus g n x h j's method can be used to find probability of d m n plus greater than or equal to d under h naught for small values of m n with obvious modifications. One can talk about the full-fledged asymptotic distribution of d m n plus also let me give that here. Limit of m n tending to infinity such that m by n is a constant of the probability root m n by m plus n d m n plus is less than or equal to z under h naught that is 1 minus e to the power minus 2 z square. We reject if d m n plus is greater than or equal to c alpha where c alpha is determined by probability of d m n plus greater than or equal to c alpha under h naught is equal to alpha. To test h naught against h 1 g y x greater than or equal to f x for some x we interchange the role of x's and y's in the above. So, basically you can consider here g n x minus f m x and then similar thing will be holding here. Another concept is that of efficiency of the test here. Efficiency of test like the consistency of test like consistency of estimators we have an analog of the efficiency of the test. So, first is the power efficiency let us consider that power efficiency. These concepts I am introducing because in the usual classical theory of testing of hypothesis in the parametric situations we have the concepts of the most powerful test uniformly most powerful test unbiased and other things. Now, when we are dealing with the distribution free situation or the nonparametric situations and certainly those type of most powerful and other concepts are not valid and therefore, we need to look at the asymptotic properties. So, that is why consistency and now we are talking about the efficiency of this. So, first thing is that about the power efficiency. So, let us consider say omega is the parameter space here. So, suppose I have a point a subset of omega. So, we are considering the hypothesis testing problem H naught theta belonging to small omega against H 1 theta belongs to capital omega minus small omega. So, let us consider say T n and S n. So, let me put S n star here be two sequences of test statistics. Now, we considered it sequence because they are based on the sample size n. So, sample size may vary that is why we are calling it for testing H naught against H 1. So, here n and n star are the respective sample sizes. So, let us take an arbitrary point theta star belonging to alternative hypothesis set. So, let us consider C n is the size alpha critical region for test T n and let us call say D n star is the size alpha critical region for test S n star. So, now let us consider probability of T n belonging to C n then this will be certainly less than or equal to alpha for all theta belonging to omega and probability of S n star belonging to D n star that will be less than or equal to alpha for all theta belonging to omega. So, these are two size alpha test and critical regions are identified by C n and D n star respectively based on T n and S n star. Let us consider now the powers here. Let us fix a power say beta and suppose we require n and n star as the sample sizes that probability of T n belonging to C n star this will be this that is equal to beta under some value say theta star then and here this theta star belongs to actually omega minus omega. So, then the efficiency of T n with respect to S n star this is for fixed alpha beta and theta star this is defined as E T s is equal to n star by n. So, naturally you can have the interpretation here. So, for example, if E T s is greater than 1 if it is greater than 1 that means for achieving the same level of power I need more observations in S compared to T that means T is more efficient than S. If this is less than 1 E T s is less than 1 then it will mean that I need more observations in T compared to S for achieving the same level. So, T is less efficient than P. So, we can write that. So, E T s greater than 1 implies that we need more observations in test based on S compared to T for achieving same power. So, T is more efficient than S. Similarly, E T s less than 1 implies T is less efficient than S. So, for this efficiency we have some sort of interpretation here, but of course, this definition is having a limited usage here because we have fixed here alpha beta and theta star. If we change the theta star and the underlying distribution changes here then I mean for different underlying distribution this may be quite complicated to adopt here, but anyway this is a definition let me consider some example here. Let us consider the usual problem based on say normal distribution say normal theta 1 and I consider H naught theta is equal to 0 H 1 theta is not equal to 0 or say theta greater than 0 for example, let us take. So, let me take theta star in the alternative hypothesis set as 1. So, the usual normal test is usual normal test that is reject H naught if root n x bar is greater than a n basically that is z alpha. So, if I am taking say alpha is equal to 0.05 then a n is equal to z 0.05 that is equal to 1.64. So, if I consider say probability theta star is equal to 1 root n x bar greater than 1.64 is equal to say 0.9. So, I am taking say beta is equal to 0.9 say for example. So, this will mean probability root n x bar minus 1 greater than. So, this will become 1.64 minus root n is equal to 0.9. So, that means 0.9 is equal to sorry 1.64 minus root n is equal to minus 1.28. So, this you can simplify. So, this is n is approximately 9. Let us take second test say based on the sin test function here S say usual sin test. For usual sin test you are saying reject H naught if k n is greater than d n. So, let me put here star. So, if I am taking alpha is equal to 0.05 and so let us say f x 0 that is equal to probability x greater than 0 that is half if theta is equal to 0 it is equal to 1 minus phi of minus 1 that is approximately 0.85 if theta is equal to 1 for example. So, now we are having 2 equations sigma n c i half to the power n that is equal to 0.05 this is for i is equal to k plus 1 to n. This is one equation and the second equation is sigma n c i half 0.85 to the power i and 0.15 to the power n minus i for i is equal to k plus 1 to n that should be equal to 0.9. So, you need actually to solve these 2 equations to get the value of n and actually k is also required here because at which point that cutoff will come. So, solving 1 and 2 we get n is equal to 15 and k is equal to 12. So, what we conclude then? So, efficiency of normal test corresponding to this one will become equal to 15 by 9 that is equal to 5 by 3 that is equal to 1.67 etcetera. So, basically it concludes that we conclude that the usual UMP test based on normal distribution is more efficient than the scientist. Now, this is under the information that the under the assumption of normal distribution. If the normal distribution is not there then certainly you cannot use this under the assumption of normality. So, if the distribution is known certainly we will use that information. Now, this is for the power 0.9 if we use some other power say 0.95 or 0.92 then certainly this efficiency level will change, but certainly it means that the test based on the normal test is certainly more efficient than the scientist here. Since this is completely dependent upon the values of alpha beta and the point that we have chosen there is another concept that is called the asymptotic relative efficiency of the test. Let us consider this concept now, asymptotic relative efficiency. Let T n and S n star be two sequences of test statistics for testing the hypothesis H naught theta is equal to theta naught this is fixed against the alternative theta belonging to some gamma. And we are considering level alpha and theta i is this is a sequence of alternatives to H naught such that theta i converges to theta naught as i tends to infinity. So, we are considering alternatives, but they approach the null hypothesis value. So, let us now consider say beta T n theta i and beta S n star theta i. So, these are the power of tests based on T n and S n star at theta i. So, T is based on n observation and S is based on n star observations. So, we consider this n i and n i star sequences and I consider the condition such that n i and n i star increasing sequences of natural numbers such that 0 less than limit as i tends to infinity beta T n i theta i is equal to limit as i tends to infinity beta S n i star theta i is less than 1. Then the asymptotic relative efficiency that is a r e of T with respect to S is defined as a r e T S is equal to limit as i tends to infinity n i star divided by n i provided the limit is unique for all sequences n i and n star which satisfy this condition star. For all sequences n i and n i star satisfying the condition. So, what does it roughly mean? It roughly means that suppose I say this value is equal to say 1.2 then it would mean that in test based on S I need on the average 120 observations if there are 100 observations needed by the test based on T to achieve the same level of power. So, now since this definition is a limiting definition this is you can say better definition than the definition of efficiency which I gave just before this because that was a fixed sample size definition here and here it is an asymptotic kind of definition. Now, there are several ways of calculating the asymptotic relative efficiency. Let me briefly mention about that methods for estimating the asymptotic relative efficiency. So, we are having two test sequences. So, we have the hypothesis testing problem theta is equal to theta naught against h 1 theta belongs to some gamma and we have T n as a test sequence and S n star is having another test sequence and we assume that both are level alpha consistent tests that means they are one sided let me say and large values are significant. Basically, it means that we are having right handed critical regions right sided or right handed critical regions this is just for convenience otherwise to compare two test it will become quite complicated. So, let us consider that say expectation of T n is say mu T n theta expectation of say S n star that is equal to mu S n star theta variance of T n say that is equal to sigma square T n theta and variance of S n star is say equal to sigma square S n star theta. So, let us assume that let T be a random variable with absolutely continuous distribution function H x for all theta belonging to gamma and we assume that the asymptotic distributions of standardized variables of T n and S n they are based on T they are basically T we are saying that they converges in distribution to T as n tends to infinity. We also assume that S n star minus mu S n star theta divided by sigma S n star theta converges in T. Let theta i theta i belonging to gamma be a sequence such that theta i goes to theta not as i tends to infinity and let n i n i star be two sequences of sample sizes such that 0 less than limit beta T n i theta i that is equal to limit beta S n i star theta i i tending to infinity i tending to infinity is less than 1. So, let us take say alpha to be between 0 and 1 and let the critical region points be such that probability of T n i greater than or equal to C n i that is equal to S n i star greater than or equal to D n i star is equal to alpha. So, then you can consider here 0 less than limit of beta T n i theta i as i tends to infinity that is equal to limit of probability theta i T n i greater than or equal to C n i as i tends to infinity. Now, based on the asymptotic distribution of T n i we can express it as limit as i tends to infinity probability of theta i T n i minus mu T n i theta i divided by sigma T n i theta i greater than or equal to C n i minus mu T n i theta i divided by sigma T n i theta i that is equal to limit as i tends to infinity beta S n i star theta i that is equal to limit as i tends to infinity S n i star minus mu S n i star theta i divided by sigma S n i star theta i greater than or equal to D n i minus mu S n i star theta i divided by sigma S n i star theta i that is less than 1. So, if we compare the let me call this as, so from these 2 relations we get this imply that limit as i tends to infinity of C n i minus mu T n i theta i divided by sigma T n i theta i is equal to minus D n i star minus mu S n i star theta i divided by sigma S n i star theta i is equal to 0. So, one way to implement this is to assume all terms to be 0 that is C n i minus mu T n i theta i divided by sigma T n i theta i is equal to D n i star minus mu S n i star theta i divided by sigma S n i star theta i for all i and solve for n i by n i star. Then a r i e of T s is equal to limit as n i star by n i for i tending to infinity that is one way. Now, to go one step further you can assume that T n and S n star have the same limiting distribution for all theta. If that is so, then you will have alpha is equal to probability theta naught T n i greater than or equal to C n i is equal to P theta naught S n i star greater than or equal to D n i star. So, this will imply that limit as i tending to infinity C n i minus mu T n i theta naught divided by sigma T n i theta naught minus D n i star minus mu S n i star theta naught divided by sigma S n i star theta naught that is equal to 0. So, if we equate each term then we can again define equating each term we can get the a r e. We can also assume so, let me just briefly mention these things because all of them are leading to additional conditions here. We can also assume the standard deviation that is this one that is equal to 1 then again we can get the condition here. So, these are the methods for estimating the asymptotic relative efficiency. There are certain results regarding this, but I am not getting into this thing. In nonparametric statistics I have discussed main methodologies that is based on the firstly we discussed in detail the role of the order statistics asymptotic distributions of the order statistics under various conditions. We also discussed the estimation of the distribution function based on the empirical distribution function. We considered the ranks the distribution of the ranks based on the ranks we considered several one sample and two sample testing problems basically based on the location and the scale we gave certain test towards the end I have introduced the concepts of asymptotic relative efficiency. The concept of the efficiency of the test the subject of the nonparametric test or you can say the distribution free procedures is quite vast. It has developed a lot in past 50 years and there are various other aspects of it. For example, there is a huge class of problems which is known as the wave dinks U statistics. Let me briefly mention about this one. We will very briefly talk about the Kruskal-Levis formula and also the say Kendall, Stouck and Spearman's thing in the remaining 5 minutes or so or 10 minutes or so. Let me just briefly mention about these topics. There are other things like density estimation based on the kernel methods and there are so many other topics that have developed over the past few years in the field of nonparametric. But in this particular course we can cover only this much may be in another course we will discuss in detail about those procedures also. So let me talk now about the wave dinks U statistic. So what is a wave dinks U statistic? Wave dinks U statistic is defined as that let F be a class of distribution functions. A parameter theta is said to be a estimable of degree m if a random sample x 1, x 2, x n, n greater than or equal to m. There exists a function h of x 1, x 2, x m such that expectation of h of x 1, x 2, x m is equal to theta. So this is based on this we define a kernel. So h of x 1, x 2, x m is called a kernel. Now based on this we can define a wave dinks U statistic as 1 by n c m sigma h star x beta 1 up to x beta m by taking all the permutations of that means by taking all m permutations from n here that means we based on that. This is called wave dinks U statistic. So this is actually the name of this. So basically this is a symmetric function in its arguments and you can actually see that most of the standard estimators in the parametric situations and also in the non-parametric situation like sin test, Wilcoxon, Rheing-San test, Mon-Whitney test statistic they will be actually based on the wave dinks test. The good thing about the wave dinks U test statistic is that the asymptotic theory is established that means asymptotic normality is there and therefore various procedures for this can be established for this one. As I mentioned non-parametric methods are also applicable when the normality assumption is not satisfied. There are also situations like when we have large scale data. So when the we are unable to fit a distribution or it is impossible to say that which distribution will be applicable because there may be too much variation when you are doing the fitting of the distribution. Then certainly non-parametric distributions and non-parametric methodologies are used. One of the important methodologies for example you can think of is the in analysis of variance. In analysis of variance we assume that the errors are normally distributed but if the errors are not normally distributed then we go for a distribution free procedure that is called Kruskal-Wallis test. So Kruskal-Wallis test procedure is based on the again ranks here actually. So if the ranks are there we are considering. Let me just very very briefly mention about this one. So we have k samples and the sample sizes are n 1 n 2 n k and n is equal to sigma n i. We consider the all the observations mixed up and we consider that r i is the actual sum of ranks based on x i 1 x i 2 x i n i in the combined sample. Then you define the statistic r i minus n i into n plus 1 by 2 whole square i is equal to 1 to k. So this is actually for the Kruskal-Wallis and then we define h is equal to 12 by n into n plus 1 1 by n i r i minus n i into n plus 1 by 2 whole square i is equal to 1 to k. So based on this the testing procedure has been developed. So this is called the Kruskal-Wallis formula. The general linear rank statistic for k sample problems have been also defined and then again versions of like Wander Warden statistics, Terry test statistics etc. are also defined. Another thing is regarding the coefficient of correlation or the coefficient of association etc. So we have the usual Carl Pearson coefficient and the distribution theory for that has been developed which I mentioned when we are discussing the multivariate statistics. But again that is based on the normality assumption. If the normality assumption is not there then we define measure of association and also a correlation coefficient which is based on the ranks. So we have the two following things one is called Kendall's tau coefficient and another one is called Spearman's rank correlation. So I will mention about this thing how they are used for example when we are discussing the same side concordance or discordance then Kendall's tau coefficient is used. For example we are seeing which side the values are like x i less than x j then y i is also less than y j if x i is greater than x j then whether y i is greater than y j. In such cases the Kendall's tau coefficient is used here and there are other situations like we are considering the coefficient of correlation then it is based on the observed values x i y i that is x 1 x 2 x 1 y 1 x 2 y 2 x n y n are taken and we based on this we define the coefficient. But here the actual observations are used that means for example we may considering weights we may considering heights and so on for two sets of this one. But suppose we are not interested in the actual values rather we are interested in the ranks because it could be a quant measurements may be related to something which may not give actual picture. For example if you are you consider the how much agreement is there between the ranks given by two judges for ranking certain athlete. So, some for example some gymnastic competition is there and then so there are several competitors and there are three judges or two judges are there. So, those two judges will give ranks on the basis of each performance. Now then whether there is an agreement that means whether they are on the same side that means a good performance is appreciated similarly by the two judges are they are biased that means whether one judge judges one candidate as better compared to the other one. In such cases we use spearman rank coefficient and it is much more useful. We end this course at this note as I mentioned that there are so many other things in the statistical methods for scientist and engineer there are so many new methodologies have been developed for example large literature is available in the regression methodologies time series and other thing, but of course then certain other courses will be available for that. There is a vast area of Bayesian methodologies and frequentist decision theory. So, one may look at those topics for applications I will end this course at this particular note.