 Now, I want to talk about something called order statistics. So, as the name indicates, here I am interested in ordering of the samples and look into the statistics of this ordered sample. So, where I should worry about ordering? So, maybe I will be often like if you just give me samples, I may want to find out which is the smallest, which is the largest, which is the middle of them. If I have to do this, I need to order them right. If you just to give me samples and if I want to find which is the smallest, maybe I want to order them increasing or decreasing way and from that I will find out which is the smallest and which is the largest of them. And this is also important in many applications. For example, which is the hot day in the last 50 years or like which year got the lowest rainfall. So, you may have data like health, this meteorological department may have recorded this data of each day of each month like that, that is like a samples for you. And from that you want to identify which was the smallest and identifying the smallest is easy, but what we want to do is understand its distributions. Now because now we have to order, we will use now one notation to put our samples into orders. What we will do is, let us say I have n samples given to me, I want to put them in an ascending order. What I will do is now I am going to take the smallest among x1, x2, xn as x1. Notice that now I will deliberately put a parenthesis on one. So, this x1 is minimum of all my samples and the next x2 is the second smallest among all my samples and that I have calling it as x2 and like that. And naturally this x of 1 with the superscript has to be smaller than x of this 2 in parenthesis like that ok. So, I have n samples as simply took the first smallest one put it first, took the second smallest one put it second like that. Now putting them in this way will already help me find some information about the data. First thing is sample range or what range my values are being taken. So, if I take my largest value and smallest value, it will give me indication of the range that is right, largest is this, smallest is this, middle of my samples are taking value on this range. Next something called median, so suppose let us take n my n is odd number. So, n plus 1 by 2 is going to be an even number ok, but n plus 1 by 2 is going to be exactly the middle right exactly the middle in my sequence and that is what is going to be called as median like when I in order my samples in increasing order and the one at the middle we are going to call it as median and when n is even n plus 1 by 2 I cannot get an integer. So, for that what we will do is we will take the average of the two middle terms ok. So, when n is even I can take two middle one and then take the average that is what I am going to call it as median. So, in a way median gives me information about what is the middle value of my samples at 50 percent what is the 50 percent value of my samples. So, an example I have just put some numbers here some random samples 10 samples I have here and their order statistics I have just put them in an increasing order I have just put them in increasing order and from this it is easy to compute what is the sample range. So, 93 minus 24 and sample median sample median. So, how did I get 61.5 here oh no this is sample mean. So, how to get this sample mean? Just add this 10 numbers and divide by 10 that is 61.5 and what is the median here? So, to apply find out the median I just first to find out whether I have odd number of samples or even number of samples in this case I have even number of samples. So, I am in this category I will just take the two of the middle ones and their average is going to be 57 and that is why I got median. And usually this median gives better indication of the typical values of my samples rather than the mean. So, mean here is giving you 61.5 whereas, the median is giving you 57. Now, what we are saying is if I have to take the typical value of this random samples I have we are going we are saying that median is a better representation of that rather than mean value ok. So, why is that? Towards. Okay. So, is there a case where I should go with mean rather than taking median and is there a case where I should prefer median over mean? You right like let us say I am interested in the average I mean interested in the performance of this class and I know that some of you are going to do very well close to 100 percent. So, you are going to boost the score of this class right and if somebody is going to just ask me tell me what is the average then I will just tell you average even though some people failed but average is very high. So, for that why should I go with median? Like this higher was the most of the class. Okay. But let us suppose now. Okay. So, maybe like that is what like if I want to just make a binary decision if somebody who is getting above mean value pass below mean is fail then you will fail and if I say median then you will fail or pass. So, that is how like I mean if I want to that is what like in the classes like this where I have to look into the overall I mean where I care about majority not like only few maybe median could be a better score. But on the other hand let us take another example instead of this class let us say I put money I am a stock I am interested in investing in stocks and I invest in 3-4 stocks all I care is on an average how much money I make. In that I should prefer mean or median in that case mean right reward I am interested in this reward like average I will just put everybody I do not care like how they are performing individually what it matters is together how much return they give me in that case maybe I will go with mean. In this case like where I have to worry about social things like like I mean whether most of them will pass or not I have to worry such connects maybe median I will go with. So, depending on your application mean may be suitable or median may be suitable. So, you can choose as per your applications but why to focus on 50 percent we can focus on anything you want. So, that is where this sample percentile comes into picture ok. Now, for any p between 0 1 100 pth percentile is observations such that approximately n p of the observations are less than this observation and remaining n 1 minus p of the observations are greater ok. And it is just like a generalization of the median we focused about median actually looked into the 50th percentile, but I may be worried about 40 percentile 30 percentile 99 percentile or 98 percentile and all of that. So, that is what this is just like a generalization of that. And to just make this more formal just like this is a definition we are just introducing this another term flower bracket B. This is going to take the value of seal if it is B plus 0.5. So, we will just it is a indication that if it is like suppose let us apply a definition let us take B equals to some 5.6. So, what will be the value of B as per this definition ok. What is a B plus 0.5? B plus 0.5 is 6.1 and B minus 0.1 is 5.1 and what is a flower of B? No sorry seal seal is going to be 6 is 6 less than 6.1. So, I am going to take this to be 6. So, it is just like if it is more close this is more closer towards 6. So, I will take it as 6 and if this this is 6 and if this is 0.4 then I would have taken B to be 5. It is simple notation for that. And now I want to identify suppose you you have some p given and you have n samples. If n p has to be between half and n minus half then you can find out that that p is simply 1 upon 2 n 1 minus 1 upon 2 n. If you want the p if you are n p to be between half and 1 minus half let us see I do not know why we are using this. Ok now using this if I am interested in the 100 p sample or 100 pth percentile or pth percentile it depends on whether p is between 0.5 or less than 0.5 or greater than 0.5. If your p is less than 0.5 your value is going to be exactly this order statistics and you notice already I have defined what is flower bracket of n p ok. It is approximation to the next immediate integer and if it is greater than 0.5 again I have to go with this adjustment to make sure that it is more towards the right hand side. And if it is p is less than 0.5 it is more towards left hand side and if p is greater than 0.5 more towards the right hand side. So, I have to make up for great corrections here. Ok. So, this is just a definitions formal way of putting it, but you I hope you understand how to compute this 100 pth percentile now ok. Now just simply put p equals to now this is for p 0.5 strictly less than 0.5 and p greater than 0.5 and what how to find it for p equals to 0.5 just use the definition of median. Median is exactly for p equals to 0.5 ok. And here I have just given one example. Suppose n equals to 15 and you want to compute 35th percentile then first we will compute what is n p and if you compute this flower packet of n p even though notice that it is in the middle it is going to take the next 18 next value 17.5 is approximated to 18 and the 35th sample is simply x of 18. The 18th sample when you have ordered them 1 to 50 and similarly you can also compute what is the 65th percentile of this sample. In this case it so happens that 65th percentile is going to be x of 33, 33rd element in my order statistics. Ok. So, this p less than 0.5 p greater than 0.5 samples percentiles exhibit symmetry. I hope you understand what is symmetry. By symmetry I mean suppose you take 0.2 and another value 0.8. So, 0.2 sample is certain numbers away from your left side. The same number of samples it will be away from right hand side for the 80th percentile. Ok. So, if you just take this these are let us say 1, 2, 3 up to let us say some n and this is some here if this is 80th percent and it is m samples away and if it is a 20th percentile I mean at this point that means that will be also m samples away in that way the exhibit symmetry and if you and if you want to generalize it if suppose for a given p 100 pth some percentile happens to the ith smallest observation like 100 pth sample happens to be the ith let us say if you have this and this is like a 100 pth sample percentile sample and this happens to be the ith smallest then if you look into 100 into 1 minus p here assuming that the p is less than 0.5 here then this is going to be what this is the ith largest element in that way they are also showing some symmetry here. And now the special cases for us are something called lower quartile and upper quartile. So, when you are going to take p equals to 0.25 that number is called lower quartile and when you take p equals to 0.75 this is called upper quartile and notice that by our definition they are symmetric if p equals to 0.25 corresponds to the ith smallest then p equals to 0.75 corresponds to the ith largest element in my order statistics. And often maybe we will see later something called box plot when you have given a bunch of samples you want to see that what samples lies within 25th and 75th percentile. And that means they there is the ones where majority of the samples I am interested in maybe others are either too low or too large. So, I may be interested in just between something between 25 to 75. So, that is why this lower quartile and upper quartile comes and maybe in python there will be already ready function left if you give a bunch of samples until identified 25th quartile or 75th quartile they will immediately find the sample value for you. Ok, in the next 5 minutes we will just discuss how to find the distribution of this order statistics. Ok, now let us focus on this some calculations. So, what we want to now do is suppose consider my discrete random variable and I am interested in n samples which are coming from the same population mean and assume that those are going to follow some common probability mass function. So, all by excess are taking value x 1, x 2 let us say some x n value n samples they are taking and I am also assuming that this values taken by my random variables are themselves ordered. So, this is the smallest value it can take this is the next smallest value and like this. Now, I am interested in what is the probability that my jth order statistics is going to be taking value less than or equals to x i. So, I hope you understanding what I am trying to say here. So, x all my x 1, x 2, x 3 they are taking values realization x 1, x 2, x 3 and they taking that probability that x is taking value x i I am going to denote it as p i. Now, by definition x of j is jth smallest value. Now, I am interested in finding this jth smallest value is less than or equals to some particular value x i. This x i is one of these elements in this. Now, what I am basically trying to find is I am trying to find cdf of jth order statistics ok. So, notice that here this is called first order statistics, second order statistics and this is the nth order statistics. And now, I am interested in finding the distribution of jth order statistics how to do that? To do that I am going to start taking p naught equals to 0, I am start doing iteration. And now, probability that x is going to less than or equals to x 1 is p 1. Everybody agree with this? So, notice that probability that x 1 is p 1. So, maybe I will just try this. So, these are ordered and I am going to take this value with probability p 1, this value with probability p 1, p 2 and this value equals to probability p 2. So, it is like x 1, x 2 like x 1, this is like p 1 and this is like p 2 like this. Agree? Just try to follow. So, probability x is less than or equals to x 1, nothing is there below. So, at x 1 it is going to get a jump of p 1 and this is like a discrete case we have. And probability like till x is less than or x 2, this point and this point here, this is p 1 plus p 2 and probability that x is going to be less than or equals to till x i is like if you are going to take this as x i. So, this is going to be all the way p 1 plus p 2 all the way up to p i. So, now take some x i. So, x i is from one of my outcomes x 1, x 2 up to x n and now y j is indicator function that my j th random variable is less than or equals to x i. I hope all of you understand this indicator. So, what are the possible values of y j? 0 1. 0 1 and I am doing it for my all n random variables which I have x 1, x 2, j equals to now 1 to n. Now what is the probability that y j equals to 1? So, now since y j is indicator that x j is less than or equals to x i, now probability that y j is equals to 1 what is this probability? Probability that x j is less than or equals to x i, agree? So, this is going to happen one only when this event is happening x j is less than or equals to x i. Now, let us focus on this y which is now defined as summation of this y j's. Y j is a binary random variable, right? It is going to take either 1 or 0. Now then what is capital Y is going to take values? It is sum of n and Bernoulli random variables. So, it is going to take either 0 1 all the way up to n. Now, if all these x i's are i i d, you can if x i's are i i d you can say notice that y j's are also i i d and I just argued that y j is nothing but Bernoulli with value p i, right? Now, this y here is sum of Bernoulli each with the same parameter p i because I have fixed this x i. So, now, because of this since y is sum of n Bernoulli random variables with parameter p i, I know that y is binomial with parameter n and p i, ok. Now, this is true. Now, this is the last one crucial step. Suppose x j that is my jth order statistics is less than or equals to x i, the claim is y has to be greater than or equals to j. That means x j, jth order statistics is less than or equals to x i. That means at least in the sum y there has to be j once, ok. That is why y is going to be greater than or equals to j. And then, if y has to be greater than or equals to j, you can verify that that actually means that x of j has to be less than or equals to x. You just verify that I mean you need to think about this. These two events are same. If that is the case, probability that x j, jth order statistics is being less than or equals to x is nothing but probability that my Bernoulli random variable is greater than or equals to i. Now, you know what is y is Bernoulli is greater than or equals to j, you know how to compute its probability. And now, x j is a discrete. Now, I am trying to find what is the probability that x j is equals to i. I know this is nothing but x j is less than or equals to x i minus x j is going to be less than x i minus 1. This is again because the discrete one. And now, using this relation, like I know that this is nothing but in terms of probability y greater than i. I use this property and now I can compute this, this and plug in to get what is the probability that my jth order statistics is x i. Ok, verify this. I know that you need to peacefully sit and read this and you will see this, all these calculations come out fine. Ok, let us stop it.