 Now, we will be starting the new topic called nonparametric methods. Now, if you remember my terminology in the previous lectures for example, when we were discussing parametric methods we always started with the statement like saying let x 1, x 2, x n be a random sample from a probability from a population with probability distribution and we called it p theta. That means, we knew that the distribution is of the form of a normal distribution with parameters mu 1 sigma square, a Poisson distribution with parameter lambda, with binomial distribution with parameters n and p or a gamma distribution with parameters say r and lambda, Pareto distribution, Weibull distribution. So, these are all called parametric models because basically what we are saying is that we are able to pinpoint the probability model for our phenomena or the population under a study. But and therefore, the methods that were derived they were specific to that for example, unbiased estimation, minimum variance unbiased estimation, we had the concept of mean squared error of the estimators, we had the concept of admissible minimax estimators, we had also the testing problems in which we considered a parametric models in normal distribution. So, we are testing whether the mean is equal to 0 or the mean is equal to 1 or the mean is theta is less than or equal to theta naught or theta greater than theta naught etcetera. So, all the procedures that were developed were under the assumption that we are having certain distribution for our population under a study. But many times it happens that either the data is insufficient for fitting of a distribution or it is too volatile to actually fit a distribution. In that case we may need the methods which are under the assumption that only we make some general assumptions like a continuous distribution etcetera and specific form is not assumed. So, such methods they are called distribution free methods or the methods of nonparametric statistics. So, distribution free. So, basically what happens that whatever method we derive they will be free from the distributional assumption model that means distributional model assumptions. For example, we will not say that it is a normal distribution or it is a Poisson distribution or it is a binomial distribution etcetera. In order these methods are also quite old after right after the Fischerian area era that means 1930s and Neyman-Pearson theory. Abraham Wald came and through his efforts mainly the nonparametric methods Wald, Wolfowitz etcetera they started developing and especially there was a treatise by Wald on the nonparametric methods. So, these methods are quite old and in 1960s by Hagek and there are other people who developed these methods, Hoeffding. So, we will be coming across and then also Kalmogorov, Smirno they developed a powerful method which was also free from the distributional assumptions. So, because of that these methods have gained popularity. Now, with the advent of computer oriented procedures these methods are easy to apply. One thing one can understand that since we are having less assumption about the model of the population that in general the methods will be slightly less powerful than the methods when we have the information on the type of the distribution, but that is expected because if we have more information assumed then your method should be more powerful. The primary building block of nonparametric methods is the following result that is called probability integral transform. Briefly I mentioned about this thing in my distribution theory that means when we were discussing the distribution of a function of random variable. So, this particular result is actually a building block of the methods which are developed for the nonparametric statistics. So, I will state the following result, if x is a continuous random variable with some CDF say f of x let us define y is equal to f x of x that means in place of this small x I replace by capital x. So, this becomes a function of the random variable because it is like this let me take an example. Suppose I am considering say x following exponential distribution with parameter lambda then what is your capital F x that is 1 minus e to the power minus lambda x for x greater than or equal to 0 it is equal to 0 for x less than 0. In this if I define y is equal to f x then that is equal to 1 minus e to the power minus lambda x basically for x greater than or equal to 0 which is true of course, because x is a positive random variable. So, this will be this function basically. So, then if I look at the distribution of that then y has a uniform distribution on the interval 0 to 1. Now this is a very strong or you can say powerful result, but because what it says that from any distribution I can achieve at least in the continuous case I can go to the uniform distribution. And together with this we will give inverse of this it becomes a very powerful tool for the simulation. Let me look at the proof of this let us define say u is equal to supremum of x such that f x is equal to f x x is equal to y for 0 less than y less than 1. So, if I consider f of x is equal to y less than or equal to y then this is equivalent to saying that x is less than or equal to u. So, if I consider say f y y that is equal to probability of y less than or equal to y that is same as probability of x less than or equal to u that is equal to f x of u f x of u is nothing, but y. So, what we are getting that if I am considering a y a point between 0 to 1 then capital y less than or equal to y is equivalent to x less than or equal to u because u is the supremum of this set of values for which f x is equal to y. So, probability of y less than or equal to y is same as probability of x less than or equal to u. So, that is f x u and that is equal to y. So, we have proved that for y between 0 to 1 f y is equal to y and of course, if I consider say y to be less than or equal to 0 certainly f of y will be 0 why because f is a CDF f is a CDF. So, CDF takes values between 0 to 1 and for y greater than or equal to 1 f y y that will be equal to 1. So, these two statements are valid because f is a CDF all the values of y f y are between 0 to 1 only. So, what we have proved? So, we have proved that f y let me write it thus we have shown that f y is equal to 0 for y less than or equal to 0 it is equal to y for 0 less than y less than 1 it is equal to for y greater than or equal to 1 which is CDF of a uniform 0 1 random variable. So, we have proved that if x is a continuous random variable with CDF f x then y has a that means f x in place of a small x I replace by capital X. So, it becomes a function. So, this random variable will have a uniform distribution. So, that means if I have a random sample if x 1 x 2 x n is a random sample from a continuous distribution with CDF say capital F then f of x 1 f of x 2 f of x n is equal to 0. So, this is a random sample from uniform 0 1. So, this is a very powerful result and in fact, many of the methodologies of nonparametric statistics will be based on this result. Of course, we are interested in the whether the converse of this result is also true that we consider inverse probability integral transform. So, let us look at say y following uniform 0 1 and let f be CDF of a. So, basically we assume it to be a absolutely continuous let f be absolutely continuous CDF ok then define say x is equal to f inverse y. Let us look at. So, how do you define inverse f inverse of a function is defined as infimum of the set of all x's for which y is less than or equal to f x. So, if I consider probability of f inverse y less than or equal to say x that is probability of x less than or equal to x. So, that is equal to probability of y less than or equal to f x, but what is this thing y is uniform 0 1. So, this has to be simply f f x. So, what we are saying CDF of capital x is f x. So, what we are proving is that that if y has this. So, let me write this result here in the form of a theorem proved that if f is an absolutely continuous CDF and f y is uniform 0 1 then f inverse y has CDF capital F. These 2 results taken together are also the you can say building blocks of the simulation procedures because in simulation we have to generate random samples from some given distribution with CDF capital F. So, what we do the procedures have been developed to generate pseudo random numbers basically that is the uniformly distributed numbers on from 1 to some upper bound which is defined by the upper limit of the largest integer in a computer program. What we do we divide by that upper bound. So, you get uniformly distributed random numbers between 0 to 1. Now, f is an own distribution then you can consider f inverse of that. Then we will get the random sample from that particular distribution. Let us consider say this example of exponential distribution. So, if we consider. So, let me consider this as an example. If say u 1 u 2 u n is a random sample from uniform 0 1 then 1 minus e to the power minus lambda. So, you calculate the inverse of this actually if I write the y is equal to 1 minus e to the power minus lambda x this is f. So, 1 minus y is equal to e to the power minus lambda x. So, minus lambda x is equal to log of 1 minus y. So, x is equal to minus 1 by lambda log of 1 minus y. So, this is the inverse of this. So, then if I consider the random variables x 1 x 2 x n where x i is equal to 1 by lambda log of 1 minus y i then this has uniform sorry exponential lambda distribution. So, this method helps us to generate random samples from various distributions at least for them for which this f inverse in a closed form. If f inverse is not in a closed form then of course, the method becomes difficult. For example, if we look at the normal distribution, but then we look at some other transformations because many distributions are related to each other and for example, normal distribution can be generated through some other transformation. So, there are methods which are available for that may be in one of the classes I will briefly touch upon the simulation part also. Now, let us the next building blocks of the non parametric methods they are called order statistics. This order statistics also I discussed earlier in distribution theory. As you know that if I consider let x 1 x 2 x n be a random sample. We define x 1 to be the smallest of x 1 x 2 x n. Then x 2 we define to be the second smallest of x 1 x 2 x n and in a similar way x r is the r th smallest of x 1 x 2 x n and so on and finally, x n is the largest of x 1 x 2 x n. Then we call this x 1 x 2 x n as the order statistics of x 1 x 2 x n. If we assume that random variables are continuous then the probability of any of the x 1 being x 2 being equal will be 0. In that case we can study the distribution theory or the properties of the distribution of this. Let me firstly consider the joint distribution of x 1 x 2 x n. If you remember in the part when I was discussing the distribution theory and we discuss the distribution the joint distribution of the functions of random variable. We have considered this distribution. So, that means, if I consider say let me consider the general case the joint distribution of order statistics. Before going to the derivation let me also talk about the use of this order statistics from a practical point of view. Because I am giving this course for the scientists and engineers who are who may not be knowing exactly the statisticians they may be simply using these methods. So, what is the use of this? So, many a times it happens that we are not interested in observations as they are given to us or as they arise rather we may be interested in a particular form of that or you can say ordering of that. Let us look at some very straight forward examples. Suppose there is a testing of the strength of certain material or certain brand of certain instrument or something like that. If we are considering certain brand of a certain instrument then we may like to use that one which has the largest lifetime. So, suppose the lifetimes have been recorded then I am not interested in each of them rather I am interested only in the largest one. In a similar way suppose there is a selection of the best candidate then I may be interested in say for example, I have two positions. So, out of 10 candidates who appeared the scores are given for those 10 people I am a I may be looking at x 9 with a bracket and x 10 rather than looking at all of them I may be interested only in the best two that is the largest two. Similarly, in some other cases I may be interested in the minimum for example, an item with the lowest price or the three items with the lowest price in a set of say 10 items. So, if you consider for example, certain sports events say gymnastics or say sports events. Figure skating and so on where is scores are given by the judges then we look at the total scores and then we look at we rank them according to the largest to the lowest and then the top two top four are like that top three they are considered actually. That means this order statistics are always indirectly playing role in the real life. In whatever physical application we are actually making use of them. Therefore, in nonparametric methods when we are having the parametric methods then certainly we go by the averages and so on. So, the distribution theory is nice, but when we have we do not have too much knowledge about the form of the distribution then we use the order statistics. So, studying the distributions of the order statistics is one of the primer you can say of primary concern in the nonparametric methods. So, the joint distribution of order statistics let us consider x 1, x 2, x n. We use the notation say let x 1, x 2, x n be a random sample from a continuous distribution ok. So, f. So, this is a random sample that means I am at this stage assuming them to be independent and identically distributed and pdf then let us assume say small f and we use this notation for brevity. We want the joint distribution of y 1, y 2, y n ok. We know that in order to obtain this we have to firstly write down the joint pdf of x 1, x 2, x n and then we apply the transformation. So, the joint distribution or the joint pdf of x 1, x 2, x n. So, that is simply written as product of f x i, i is equal to 1 to n ok. We can also write it as f of x where x is this vector and small x is x 1, x 2, x n. Now, this transformation when we make this transformation y i is equal to x i we have to write down the inverse transformation and the Jacobian. So, there are n factorial inverse images of this transformation. For example, you may have x 1 less than x 2 less than x n, you may have x 2 less than x 1 less than x 3 less than x n and so on you may have x n less than x n minus 1 less than x 1. So, if we call these regions as say a 1 that is x such that this happens, a 2 is x such that this is true and so on, a n factorial x such that this happens. Let us consider the Jacobian in one case. So, the inverse transformation in the first region the inverse image is x 1 is equal to y 1, x 2 is equal to y 2, x n is equal to y n. So, Jacobian of the transformation that is del x 1 by del y 1 that is 1, del x 1 by del y 2 is 0 and so on, del x 2 by del y 1 0, del x 2 by y 2 del y 2 0 1 and so on which is nothing, but a determinant of the identity matrix that is equal to 1. Now, what happens in the second region say for example, a 2. In a 2 the inverse image if you write then you will have x 1 is equal to y 2, x 2 is equal to y 1, x 3 is equal to y 3 and so on, x n is equal to y n. So, if we consider the Jacobian del x 1 by del y 1 is 0, del x 1 by del y 2 is 1 and then there are zeros. Here del x 2 by del y 1 is 1, this is equal to 0, 0, 0, 0, 0, 0, 1 which is nothing, but determinant of the identity matrix with the first and second rows interchanged. Other rows are the same or you can say first and second columns interchanged. So, this value will be equal to minus 1. Subsequently if we consider other transformations for example, if you look at this one here it is totally reversed. So, it is also obtained. So, it will become 0, 0, 0, 1. So, it will become simply all the rows and columns are changed here like this is going to the last one, this is going to the second last, this is going to the first one. So, if that is happening then the determinant will be simply minus 1 to the power n because n transformations in the initial one will be there. So, if you look at the determinant they are either plus 1 or minus 1 in every case. So, if I consider the absolute value it is going to be plus 1. So, in all cases the absolute value of the determinant or the Jacobian is going to be plus 1. Now look at the density, the density in every region is nothing, but so here it is interesting you look at this one. Here all the terms are coming f x 1, f x 2, f x n. When you consider the inverse transformation in the first region it is f y 1, f y 2, f y n. In the second one it will be f y 2, f y 1, f y n. In the last region it will become f y n, f y 2, f y n minus 1, f y 1. That means, all the time the n terms are coming, but they may be in any order ultimately it is the product. So, what you are getting? Product of f of y i, i is equal to 1 to n and you have to sum up all these things n factorial times. Since there are n factorial regions the pdf of y 1, y 2, y n is n factorial times product of f of y i, i is equal to 1 to n. At the same time you have y 1 less than y 2 less than y n. In beginning I have not assumed any interval then it will be from minus infinity to infinity. If there is a sub interval for example, if it is a uniform distribution on the interval 0 to 1 then this will be 0 to 1. If it is some a to b then it will be like that if it is say 0 to infinity then this will become 0 to upper limit infinity. So, like that there can be any regions here. So, this is the joint probability density of the order statistics. Once we have the joint density we can derive the density of the particular choices. For example, if I want the density of second one, if I want the density of the third one, if I want the density of the largest, if I want the density of the smallest etcetera. We will have to evaluate the integral with respect to other variables. For example, if I want for y 1 then up to y 2 to y n I have to integrate. If I want for y 2 then y 1, y 3 and so on I have to integrate which can be done I will show you a systematic method for this can be obtained. However, the case of the smallest and the largest can be done directly also because that is much more straight forward based on the representation. So, let me show you that thing separately and then we look at the distribution of the some middle one that is rth order statistics say where r can be 2, 3 and so on. So, let us consider say distribution of minimum that is x 1 it is the minimum of x 1, x 2, x n. Let us consider say probability of say x 1 greater than some value say y 1 then it is equal to probability of x 1 greater than y and so on x n greater than y 1 because if the minimum is greater than y 1 then each of x 1, x 2, x n has to be greater than y 1. At the same time if each of x 1, x 2, x n is greater than y 1 then the minimum has to be greater than y 1. So, this event and this event they are equivalent. Now, this x 1, x 2, x n are independently distributed. So, this can be written as the product of probabilities x i greater than y 1 for i is equal to 1 to n. Now, each of x i is having the same c d f capital F. So, this is nothing but 1 minus f of y 1 whole to the power n. So, what we have obtained? If we consider the c d f of x 1 then it is equal to 1 minus 1 minus f of y 1 to the power n. So, this is the general expression for the cumulative distribution function of the smallest order statistics and here you can easily see that I have not made any other assumption other than the form of the c d f as taking to be capital F. There is no other assumption. I am simply taking x 1, x 2, x n to be a random sample. Therefore, they are having the same distribution f and they are independent. So, this joint probability becomes equal to the product of individual probabilities. Since x 1, x 2, x n they are continuous. So, capital F is absolutely continuous function. Therefore, it is differentiable almost everywhere and I can consider the p d f of since f is absolutely continuous we have the p d f of x 1 as. So, if you differentiate with n times 1 minus f of y 1 to the power n minus 1 then derivative of this that is small f of y 1. So, we are able to derive the distribution of the smallest. Similarly, you can consider the distribution of the largest that is x n that is equal to maximum of x 1, x 2, x n. So, here let us consider again f of f x n of y n that is the probability of x n less than or equal to say y n. Once again we utilize the definition of the order statistics. We are saying that the maximum is less than or equal to y n. This is exactly equivalent to probability of x 1 less than or equal to y n and so on x n less than or equal to y n. Because if the maximum is less than or equal to y n then individually each of x 1, x 2, x n will be less than or equal to y n. At the same time if each of x 1, x 2, x n is less than or equal to y n then the maximum is also going to be less than or equal to y n. Once again as I applied the argument of independent and identically distributed random variables we can have this is equal to product of probability of x i less than or equal to y n i is equal to 1 to n and each of x i has the same c d f f. So, this is simply becoming f of y n to the power n. So, this is very interesting the c d f of the largest is nothing, but obtained from the original c d f by taking power n. So, this is very very interesting in the minimum case it was becoming 1 minus to the power n and then 1 minus of that and here it is straight forward the same thing just raised to the power n. Now once again if capital F is the absolutely continuous function because f is a continuous random x is a continuous random variable that means this random variables came from a continuous population. Therefore, it is differentiable almost everywhere and the p d f. So, the p d f of x n is obtained by derivative of this. So, it will become n times f of y n to the power n minus 1 and then the derivative of this. So, I have been able to successfully obtained the distribution of the minimum and the maximum of the observations in a random sample. So, as I mentioned to you the typical applications are in selections when we want to choose the cheapest item we want to have the item with the maximum longevity and so on. So, in almost all physical applications in economics in industry in social sciences etcetera everywhere we are having this thing holding here. So, that means we can actually derive the distribution of the minimum and maximum and we can utilize that. So, I mean it you can think in this particular fashion. For example, I have found the maximum now I am once we have selected that thing then we will be really using that again and again. So, what is the distribution of that? For example, if it is life what is the average life or what is the variability of the life? For example, if it is the cost. So, I have chosen the one item with the least cost then how that least cost is varying over time. So, all these things are of real interest in the for scientists and engineers in various disciplines including people working in economics or sociology where we are ranking the people according to some other kind of features ranking of the items, ranking of the instruments and so on. Then a more general thing would be that rather than looking at only the minimum and maximum we may be looking at any particular position. So, once again why that is of importance? For example, in the usual parametric methods we generally consider the mean of the observations. So, mean of the observations is coming out we are able to obtain the distribution in most of the cases, but when we are considering the form of the distribution not known then studying the distribution of the mean becomes very difficult. Except in the cases that where we assume that form and then we consider large sample theory. So, we can consider central limit theorem, but then again there are drawbacks of using the sample mean in sense of having less robustness in the sense that if there are wild fluctuations there are extreme observations then the sample mean is more affected. Then one may be interested in the median, median of the observations. So, median means suppose I am having odd number of observations then it is the middle. If I am considering even number of observations in then it is between the middle 2 or you can take the average of the middle 2. That means, in place of the largest or the smallest I may be interested in the distribution of some other order statistics that means, I may be interested in x 3 x 4 for example, I may be looking at a particular position. So, for example, I may be interested in say 1 by 4 or I may be interested in 3 by 4 that may be our cut off. So, what are the distributions of this point? What is the distribution of this point and so on? So, in general we want the distribution of the r th order statistics. So, let us discuss that thing. So, next we consider the distribution of the r th order statistic x r. So, where of course, your r is between 1 and n. If we go by our usual theory then what we have to do? We have actually obtained the joint distribution of x 1, x 2, x n. So, y 1, y 2, y n I use this notation they are the order statistics. From here I integrate the all the variables except the y r here. So, we have to develop a algorithm for that. Let us consider the joint distribution we call it y r. So, the joint distribution of y 1, y 2, y n that we wrote n factorial times product of f y i i is equal to 1 to n where now I write this region in a more elaborate way minus infinity less than y 1 less than y 2 less than so on y r minus 1 less than y r less than y r plus 1 less than and so on y n less than infinity. So, here except y r I have to integrate all others. So, we can consider like this the original pdf of y r that is equal to x r is obtained by integrating f y with respect to y 1, y 2, y r minus 1, y r plus 1 and so on y n as below. That is f of y r that is equal to n factorial times let me write it here. If we integrate with respect to y 1 then it will be from minus infinity to y 2. If we integrate then y 2 then it will be from minus infinity to y 3 and so on up to y r minus 1 this will be integrated from minus infinity to y r minus infinity to y r. So, this is n factorial product of f of y i i is equal to 1 to n d y 1 d y 2 d y r minus 1. Now, then let us look at integration of y n, y n will be integrated from y n minus 1 to infinity. So, this will be from y n minus 1 to infinity then next one will be y n minus 1 then it will be from y n minus 2 to infinity and so on. Then ultimately y r plus 1 will be from y r to infinity ok. So, this is d y n d y n minus 1 and up to d y r plus 1. Now, let us look at the integration part here. First one is integration of f y 1. So, if we integrate f y 1 we will get capital F y 1 and we integrate from minus infinity to y 2. At y 2 this will become simply y 2 and this will become f of y 2 minus f of minus infinity that is 0. So, it will be simply f of y 2. At the next stage then I have when I am integrating with respect to y 2 then I have to integrate f of y 2 into a small f f of y 2. The integral of this will give me 1 by 2 f square y 2 from minus infinity to y 3. So, this is simply becoming 1 by 2 f square y 3. So, at the third stage then I have 1 by 2 f square y 3 f of y 3 when I have to integrate with respect to y 3. Then this will give me 1 by 3 into 2 f cube y 3 from minus infinity to y 4 which I can say 1 by 3 factorial f cube y 4. So, you can see this I can call 1 by 2 factorial. So, I am getting a pattern here. So, that means, if I continue like this and go up to y r minus 1 then the final integral will give me 1 by r the last one will give me f of y r. So, this will be r minus 1 and this will be r minus 1 factorial because when we are getting 4 then here term is 1 less then it is 3 then 1 term is less here. So, at the when we do up to r then I will get f of y r here and here it will be r minus 1 and r minus 1 factorial. So, that is one part here. So, I am getting n factorial divided by r minus 1 factorial f of y r to the power r minus 1. Now, let us look at the other terms here. The next terms will be coming small f of y n. So, small f of y n when I look at small f of y n that will give me capital F of y n and the integral is now from y n minus 1 to infinity the integral for y n is from y n minus 1 to infinity. So, this is y n minus 1 to infinity. Now, f of infinity is 1. So, it is becoming 1 minus f of y n minus 1. So, now, at the next stage I will have 1 minus f of y n minus 1 multiplied by small f of y n minus 1. So, this if I integrate I will get 1 by 2 1 minus f of y n minus 1 square with a minus sign. So, this is from minus infinity to now sorry not from minus infinity it is from now y n minus 2 to infinity. So, that is y n minus 2 to infinity. So, this is giving me see at infinity this is becoming 0. So, this will become 1 minus f of y n minus 2 whole square 1 by 2. So, now, at the next stage I have 1 minus f y n minus 2 square f of y n minus 2 and when this will be integrated I will get 1 by 3 into 2 1 minus f of y n minus 2 cube with a minus sign and this will be from y n minus 3 to infinity. So, when I put the value this will give me 1 by 3 factorial 1 minus f y n minus 3 cube like that I have to go up to y r term here. So, the last term will give me 1 by now you see here if it is n minus 3 here I am getting 3 factorial. So, if it is 1 minus f of y r then the power will become n minus r and here it will become n minus r factorial. So, this will be the term which will be left out after integrating up to y r plus 1. So, let me substitute it here and I get here n minus r factorial 1 minus f of y r to the power n minus r and I am left with the corresponding term f of y r here. So, we are able to derive the probability density function of the r th order statistics which we can also write in that form of beta function because this is gamma r gamma n minus r plus 1 and this will become sum of that that is gamma of n minus see if you add this n minus r plus r minus 1 n minus 1. So, that is gamma of that. So, it is becoming basically beta of r n minus r plus 1 f of y r to the power r minus 1 1 minus f y r to the power n minus r f of y r minus infinity less than y r less than infinity. So, we are able to actually derive the distribution of the r th order statistics. Now, this approach is very interesting because if I consider 2 here in place of 1 suppose I want for first and second or I want for second and fourth. That means, in general I want for r th and s th order statistics. Now, then this procedure that I have given it will be applicable there also because then you will have to integrate up to r minus 1 suppose r is less than s. Then on the right hand side you have to integrate s plus 1 s plus 2 up to x n and in between r and s you will have to integrate r plus 1 up to s minus 1. Now, I have already given you the procedure that what term will be coming because up to r if you are doing you will actually get this term that is 1 by r minus 1 factorial f of y r r minus 1. Then you are doing up to s that is from s plus 1 onwards you will get n minus s factorial 1 minus f of y s n minus s. Now, when you look at the terms between y r and y s that is y r plus 1 and so on then you are integrating in the range. So, that range will give you then simply r minus s minus 1 factorial that is sorry not r minus s I am taking s to be greater than r. So, it will be s minus r minus 1 factorial and the difference is coming here upper and lower limit in the these cases 1 limit was coming out to be 0, but in that case both the limits will be coming. So, you will get f of y s minus f of y r to the power s minus r minus 1. So, this procedure can be extended. So, let me mention that thing the above procedure for determining the pdf of the pdf of the x r is equal to y r can be extended to find the pdf of say y r y s that is equal to x r x s. Let us take say r less than s. So, then this will give me f of y r y s that is equal to 1 by r minus 1 factorial sorry this will be n factorial divided by r minus 1 factorial and you will have n minus s factorial as I mentioned and in the middle terms you will have s minus r minus 1 factorial. And then you will have the f of y r to the power r minus 1 1 minus f of y r to the power r minus 1 1 minus f of y s to the power n minus s. Then you will have f of y s minus f of y r to the power s minus r minus 1 f of y r f of y s minus infinity less than y r less than y s less than infinity. You can also look at say the distributions of 3 suppose I say r s t where r is less than s less than t. In that case you will have r minus 1 then you will have n minus t factorial then you will have s minus r minus 1 factorial then you will have t minus s minus 1 factorial and similar terms will come here also. So, this procedure that I have given it is giving you an insight into the calculation for such things. So, that is very useful and you will be able to actually obtain the distributions of various type of such quantities here. In particular if we consider say any k order statistics where k is less than n then that can also be obtained. Only thing is you have to write down the ordering like if I say m 1 m 2 m k where m 1 is less than m 2 less than m k. So, entire thing can be written in an algorithmic way that means, the first one will become f of say y m 1 to the power m 1 minus 1 then you will have f of m y m minus m 2 minus f of y m 1 to the power m 2 minus m 1 minus 1 like that all the terms will come there. So, this is a very useful method. I will be demonstrating you for example, you can find out the distribution of the median, the distribution of the range etcetera, but in particular let us see if we take say uniform distribution then what do you get? Suppose I take f to be the uniform distribution as a special case if we take f to be uniform 0 1 then you will get say f of y r that is simply 1 by beta r n minus r plus 1 and here you will get simply y r and here 1 minus y r. So, that is very interesting this is nothing, but which is a beta r n minus r plus 1 distribution. So, this is a known form. So, here it is very interesting here that if I consider the special case here then we are getting a beta distribution of in the uniform case. So, this form of course, it is used at many places and this is also given as an independent derivation of beta distribution although we have mentioned that beta distribution arises in various practical applications, but this is also given as one application that means in sampling from uniform 0 1 distribution the distribution of the distribution of the rth order statistics is beta. So, this is very interesting result here. I will be next talking about the moments of the order statistics then as you can easily see that since we are not assuming a functional form for capital F that means we do not know exactly what form is there that is difficulty in getting exact expressions. So, we will talk about that then we talk about the approximations then we talk about asymptotic expressions for the distribution of the order statistics. Using this we will define something called an empirical distribution function which will be used as a estimate of capital F and we will use the properties of that. So, in the following lectures we will be continuing this theory here.