 So, let us just begin with the first one. So, if you have if you want to find the density function or so cumulative we will start with CDF because after once we have obtained this we can obtain the pdf also. So, here fxj is the CDF for the jth ordered statistic. So, which means and so the value fxjx means that this is the probability of xj being less than or equal to x and what does this mean? If this is the jth ordered statistic that means up to 1, 2, x 1, x 2, up to xj they should all be less than or equal to x at least. So, at least j of the x 1, x 2, x n should be less than or equal to x. More can be less than x, but at least since I am saying I am computing the probability that xj should be less than or equal to x that means at least j of x 1, x 2, x n should be less than or equal to x. So, now I can write since it is at least so it can be j, j plus 1, j plus 2 which are less than or equal to x. So, therefore, this probability can be written as summation i varying from j to n. The probability of exactly i of x 1, x 2, x n are less than or equal to x. So, it should be clear right. So, therefore this implies that I can write this probability as n c i because out of the n you are choosing i, any of the i can be less than or equal to x. So, n c i and then this is f i x because i of them you want to be less than or equal to x. So, this is this into 1 minus f x. So, the remaining n minus i are greater than or equal to x. So, because exactly i of them are less than x. So, therefore remaining n minus i are greater than or equal to x. So, therefore this will be the probability of that. So, you are summing this up right. And now let us give it a more concise form. So, because this is of course, very unwieldy you cannot. So, now consider this is integral and this is where you should see how we can relate you know summations with integrals and so on. So, consider the integral j n c j 0 to f x t raise to j minus 1 1 minus t raise to n minus j d t. And let me call this integral as i of j minus 1. So, this index and n minus j index of 1 minus the power of 1 minus t. Now, if you integrate by parts. So, integration by parts will give here let me treat this as the first function. So, integral of this will be j upon t j upon j then 1 minus t raise to n minus j this computed from 0 to f x plus n minus j upon j 0 to f x t j and derivative of the second function. So, which will be n minus. So, the derivative would be n minus j 1 minus t raise to n minus j minus 1. So, this is what you get by integration by parts. So, here of course, at 0 this is 0 at f x it will be. So, here I have written down. So, the j part see this j cancels out because both the terms have j in the denominator. So, this cancels out and you are left with n c j f j raise to x 1 minus f x raise to n minus j plus when you come to the integral this will be n minus j into n c j n minus j i of j n minus j minus 1 because j here this was j minus 1. So, this is j minus 1 this was n minus j this is n minus j minus 1. Now, very nicely you can simply just write down this expression and this and manipulate the terms and you can immediately see that this whole thing can be written like this j plus 1 and j plus 1. And therefore, this will then become well the whole thing why did I write because my i j is together. So, therefore, I do not have to write this term this is not there because I am trying to say that you manipulate this even this is not correct I should have written the integral here only. So, let me just say this is not correct let me write continue writing the integral n j and. So, what will we getting here 0 to f x 0 to f x t raise to j 1 minus t raise to n minus j minus 1 d t. And then I am saying that you manipulate this and you can write this as. So, let me rewrite this I am writing as because now you need j plus 1 n minus j n minus j minus 1 0 to f x t raise to j 1 minus t raise to n minus j minus 1 d t. So, this whole thing can be written as i of j n minus j minus 1. So, therefore, you see the iterative relationship is there. So, now this plus integral where the power of the term 1 minus t raise to n minus j has now become n minus j minus 1 power of t is going up from j minus 1 to j and so on. So, now iteratively when I write I will again get a term when I integrate this by parts I will get a term here plus then that will be j plus 1 f x raise to j plus 1 then n minus j minus 1. And then another integral which will be i j plus 1 n minus j minus 2. So, this way iteratively when you do it this will this power will finally become 0 and this will become your n. So, therefore, you can show this integral this summation is equivalent to the integral and therefore, what I have this is what I have said that. So, finally, you can show that this integral that I wrote down in the beginning is equal to this sum which is equal to your cumulative density function of x j. And this you should have recognized by now because this is see this is the beta integrand together with your beta functions when you want to make it a p d f. So, therefore, and since the limits are from 0 to f x therefore, this is called incomplete beta function. And so finally, I have been able to replace get this probability the cumulative density function in terms of this integral. So, when you differentiate both sides of double star you will get f x j from here it will be the p d f of x j and this is you know differentiation under integral sign. So, since this is function of x this will become f x here and otherwise you just substitute for t f x and. So, you get this same that for special cases say for example, when your sample values are from the uniform distribution or I think from may be from normal distribution we will see through examples then it is easier to get this explicit expression for your c d f and for your p d f. So, we will go through this example to see how and of course, the question arises as to why are we doing this and you will see that often for example, let us just go through this example and you will know why we are talking about obtaining p d f for these order statistics. So, if you have a sample of size 2 n plus 1 independent and identically distributed random variables are observed then the n plus first see n observation from this side and on this side. So, n plus first is on the center smallest is called the sample median. So, when you arrange them order them and then the n plus first one smallest is called the sample median. So, now let us say we want to find out the. So, if we have a sample of size 5 from uniform 0 1 is observed find the probability that the sample median is between 1 by 3 and 2 by 3 and this you know when you are handling data is large data sometimes you are only interested in what the median of the sample size is. So, we will go about now obtaining the expression because you want to find the probability of the sample median between 1 3 and 2 by 3. So, here you are actually talking about x 3 your j is 3 here because sample size is 5. So, the median will be determined by the third ordered third smallest statistic. So, x 3 is the median when you have the sample of size 5. So, x 3 will represent the median of the sample and so by our formula see remember the formula was j n c j f x capital f x raise to j minus 1 1 minus f x n minus j minus 1 for f x j. So, put j equal to 3 here and this will give you 5 c 3 and 3 times this and then f x now for a uniform distribution your p d f is just 1 between the interval 0 to 1. So, this is 1 and this is given by you know for a uniform distance. So, proof of Cauchy Schwarz inequality now expected value of y square can be greater than or equal to 0 it cannot be negative because this is y square. So, therefore, when you integrate y square from 0 to minus infinity to infinity whatever it is into f x which is a p d f non negative. So, therefore, this must be non negative. So, now, but then expected value of y square equal to 0 would imply. So, the two things are possible either expected value of y square is 0 or expected value of y square is greater than 0. So, if it is 0 then this implies that probability of y equal to 0 is 1 because this expected values this that means see you will write value of y square whatever possible values y square takes into the probability of the y square taking a particular value and so on. So, this will imply that probability of y equal to 0 is 1 and hence probability of x y equal to 0 is also 1. Yes, because this is a certain event y taking the value 0 is a certain event hence probability x y equal to 0 is also a certain event and so this probability 1 and therefore, this implies that expected value of x y is 0 because this takes the value 0 with probability 1. So, 1 into 0 plus you integrate or whatever it is whichever way you want to write it down expected value of x y will be 0 and the inequality therefore, will be satisfied because this is 0 and this is 0. So, the inequality is satisfied. So, therefore, we will now prove for the case when the expected value of y square is positive x 1 less than or equal to x. So, I will look at the opposite event which is greater than or equal to x. So, if first order statistic is greater than x this implies that all the sample values must be greater than or equal to x. So, this is equal to this is equal to what 1 minus f x raise to n because if the first order statistic is greater than x since it is the smallest all other values are bigger than x 1. So, all of them must satisfy this inequality and therefore, this probability is 1 minus f x raise to n and we are interested in finding out the capital F x CDF. So, that will become 1 minus of this. So, 1 minus of this which is equal to 1 minus of this will give me my f x 1 x this is the whole idea. So, therefore, this is what you have and which I can write as 1 minus 1 minus f x raise to n. So, when you differentiate with respect to x you get the PDF here and this will be simply minus becomes plus. So, n times derivative of this is small f x. So, n f x into 1 minus f x raise to n minus 1. So, this will be a general expression. So, now what I am trying to say is that you with this expression also and now let us just substitute j equal to 1 here. So, what do you get this is 1 and so if you substitute in this formula this will be n c 1 which is n then f x and this is i minus 1 1 minus 1. So, j is 1. So, this is 1 minus f x raise to n minus this is minus j minus 1 how am I getting. So, this is coming out to be j is 1. So, this is coming out to be n minus 2 accordingly see for us it should be 1 minus f x raise to n and yeah at least from here it appears that this should be this. So, when I do it n times I differentiate and then I take this. So, it should be n minus 1. So, where is this other one missing because you are taking j to be 1. So, are you sure this is n minus j or minus n minus j minus 1 let us just verify that this is what is the formula correct formula I think it has to be n minus j yeah it should be n minus j yeah let us just make sure. So, that we do not make the mistake of n minus j right. So, that see that helps to verify. So, you see this is n minus j and therefore, j is 1. So, it will be n minus 1. So, the both things match you can obtain it directly or you can do it through the formula the formula that we obtained. Now, so once just a simple example to show you now let us see it also helps to write down the joint pdf of all the order statistics that you see actually what is happening is this is some arrangement of the sample values x 1, x 2, x n and the possible arrangements of these n sample values is n factorial. So, one of them will match this order right and so you can you can do the thing through the regress mathematics by showing that the your r n region can be divided into n factorial regions and each factorial in each one of these factorial regions the one of one arrangement of the sample values is there right and so when you do the transformation because you have to do it over whole of r n. So, the Jacobian will be one of the permutation matrices and the value of the permutation matrix is always 1 I mean and you take the positive part otherwise the value of the permutation matrix is plus minus 1. So, in without going into all that we can simply say that the joint density function would be n factorial into you see since the variables are independent the joint density function of the sample values x 1, x 2, x n is nothing but the product of the individual density functions right and maybe if you want to feel good you can but I am not writing this I am simply saying this is x 1 and this is x n but they are the same. So, therefore, I am not writing these indices. So, all of them are the same pdf and therefore, this will be n factorial into f x 1 into f x n this is the whole idea where x 1, x 2 general expression where x 1, x 2, x n are varying from minus infinity to infinity right because after all the order statistic is only one of the arrangements and there are n factorial possible arrangements of the sample values of the n sample values. So, now to find joint pdf x will be equal to expected value of x y upon expected value of y square into y. So, the minus sign is not there see it gets cancelled out. So, x i, x j then what is happening in this I will integrate this. So, for i minus 1 sample values the limits of integration will be from minus infinity to x i because i minus 1 of them have to be less than or equal to x i for variables between x i and x j order statistics x i, x j the limits are x i to x j right and the for variables having values greater than x i the x j the limits are from x j to infinity right. So, once I do this integration that means I will be integrating for i minus 1 then for variables between x i and x j and then for all values all the variables having values greater than x j let me write it this way. So, then once you do this you will get the joint density function of remember because for marginals when you have the joint density function to obtain pdf of one of them you would integrate with respect to the other one right and then get the marginal pdf for the first variable. So, here also we have done the same and therefore these remain intact and for the remaining see this is f x i raise to i minus 1 because you are integrating from 0 to from minus infinity to x i and for variables between x i and x j you are integrating from f x i to from f x i to f x j. So, this is j minus i minus 1 and this is 1 minus f x j n minus 1. So, this n minus 2. So, these add up to n minus 2 and then you have the remaining 2 to x i and x j. So, this will give you the joint density function. So, our ultimate aim say therefore see the range of the sample values is also of lot of interest in many situations and so we want to ultimately find out the pdf of the range of the sample values. So, let me just define two random variables here which are r is x n minus x 1. So, this is the range and v is the largest sample value and here of course you should try to see that you can compute the pdf of x n directly and then again verify from this formula right. So, for when you want to find out the pdf of v of x n when you say probability x n less than or equal to small x which would mean that all the sample values are less than or equal to x and so you will immediately get this thing to be f x raise to n. So, the cumulative density function of x n will immediately come out to be f x raise to n and then you differentiate. So, n times f x small f x into capital f x raise to n minus 1. So, here you can directly get this also anyway. So, we have to find out the pdf of capital R. So, can derive the pdf of see x 1 x n now first I need to know the joint pdf of x 1 and x n once I obtain that then for these are functions of x 1 and x n. So, I will use my transformation formula and get the joint pdf of R and v and from the joint pdf of R and v I will then finally, get the pdf of R this is the whole idea right. So, for to derive the pdf of x 1 and x n from your this formula right I will simply write I as 1 and j as n. So, I 1 this term is gone and here also this term is gone. So, you are left with n factorial upon n minus 2 factorial. So, n factorial upon n minus 2 factorial then this is f x n minus f x 1 raise to n minus 2 right and then this is out because n minus n is 0 and so this is f x 1 f x n. So, this is your joint pdf of x 1 and x n once I have this then I will make the transformation that I will write R as x n minus x 1 and v as x n. So, then from here you will get the relationship for. So, x n comes out to be capital V and x 1 comes out to be v minus R right and then you write the Jacobian. So, R is my first variable. So, this will be minus 1 1 and here it will be 0 1. So, the Jacobian absolute value is 1. So, now I can get the joint density function of R and v. So, with Jacobian as 1 absolute value of the Jacobian as 1 and these this transformation that is your x n is v and x 1 is v minus R in this one we just substitute for x 1 x n multiply by the Jacobian. So, then your f of f of R v that means capital R is equal to small r and v small v then this function this pdf of R and v can be obtained from this pdf to n into n minus 1 f v minus f v minus R which is your x 1 raise to n minus 2 and this is f of v minus R f v right. And here of course, it is understood that R is greater than 0 because R represents the range and x n is greater than x 1 greater than or equal to x 1. So, this is the case and of course, if range is 0 there is no point. So, we are taking R to be some positive number. So, therefore, once I get the joint pdf of R and v now my interest is in getting the pdf of capital R. So, I will integrate this and in general you will integrate from minus infinity to infinity while why should I say from minus infinity to infinity it should always be from it should be R because you see here your x n is R plus x 1. So, since this is non negative then I mean fine in general it would be sorry it would be minus infinity because x 1 in general we are allowing x 1 to take that vary from minus infinity. This sample size is from minus for a population which is from minus infinity to infinity. So, in that case this is fine the integral in general we can write as minus infinity to infinity. Now, as a special case consider the case when x i are from uniform 0 1. So, as a special case consider x i varying from 1 to n from the uniform on 0 1 then this function this pdf will reduce to the whatever I written here yeah f R R. So, I am writing this yeah. So, the pdf of the range variable would reduce to n into n minus 1 and now in this case it will be R because as I am saying now that this is non negative if all the sample values are coming from uniform 0 1 all values are non negative. And therefore, this x n has to be greater than or equal to R plus. So, an x n is your v. So, when this values v and this is R. So, then v has to be greater than or equal to R. So, now in the joint density function you are integrating with respect to v to get the pdf of R. So, then that will be the range will be from R to 1 because the variables are from 0 to 1. So, then this will become v this will be v minus R raise to n minus 2 and both the pdf are 1 1. So, 1 1 d v this is your this thing and you can see the simplification v cancels out this is R raise to n minus 2 d v. So, the integral here will be v which will be 1 to R 1 minus R. So, therefore, this is your pdf for the range. And now you can you know find out the possible. So, for a sample of size 10 from uniform 0 1 the probability that the range is larger than 0.8. So, these questions are of lot of interest. So, you want to find out the range of values that you the sample that you have observed. So, if you are saying that the range is larger than 0.8 then you want to compute the probability that R is greater than 0.8. Therefore, you will integrate this function from 0.8 to 1 and if you simplify here you get the answer is 0.6 to 4 which is a pretty large probability of the range values being more than 0.8 the range of the sample being more than 0. So, another example now from the normal distribution because I thought we had enough cases for uniform. So, if x 1 and x 2 are identically independently distributed from a normal 0 1 that means the mean is 0 and the variance is 1. So, this is a sample x 1 x 2 is a sample from normal 0 1. So, find the pdf of x 2 which will represent the max of x 1 and x 2. So, we will do again obtain this without using any formula. So, here again as I explained to you that if you want to find the pdf of cdf of cumulative density function for x 2 second the largest one then it will be this will be probability x 2 less than or equal to t which will imply that both the values x 1 and x 2 should be less than or equal to t. And since they are independent this is equivalent to probability x 1 less than or equal to t into probability x 2 less than or equal to t. And coming from t of n 0 1 will be root 2 pi e raise to minus 1 by 2 t square d t. So, this is our notation for phi t for a normal distribution. So, this is standard normal distribution. So, phi t. So, therefore, this will be phi square t. So, the cumulative density function for the max of two sample values x 1 and x 2 coming from a normal 0 1 is given by phi square t. So, then if you want to find the pdf just differentiate this which would be twice phi prime t f t. So, phi prime t will be nothing but the normal pdf which is given by this. So, it will be twice phi t into 1 by root 2 pi e raise to minus half t square. So, minus infinity to t. So, this one can then you know integrate and find out whatever probabilities you are interested in. So, it looks like that at least the normal if your sample size is sample is from a normal distribution or from a uniform distribution you can easily obtain pdf of the order statistics. In other cases also one can see and of course there are methods for computing difficult integrals by many other ways by numerical methods. Now, continuing with our joint distribution functions and the other important parameters that we need to look at and define here is covariance, variance of sums and correlation. So, this will also have a lot of implication and see here the purpose. So, before I talk about define the covariance and the variance and then the correlation a simple proposition which in fact there was no need to prove it also, but I will have written it down for completeness sake x and y are independent. So, if x and y are independent random variables require this understood random variables. Then for any function h and g for any functions h and g expectation of g x into h y is expectation of g x into expectation of h y. That means the independence carries over to the function g x and h also and. So, here this is the proof is simple because if you want to write the expectation of g x h y it will be minus infinity to infinity minus infinity to infinity g x h y f x comma y d x d y. But since x and y are independent the joint density function can be written as the product of the marginal densities. So, here when you write this as f x and into f y then I can even separate out the integrals because it will be minus infinity to infinity g x marginal of x into minus infinity to infinity h y into marginal of y into d y. And so by definition this is e g x into e h y. So, once we have this behind us then we can now talk of define the first of all the covariance. So, the idea of covariance between two variables x and y is denoted by this is the notation and is defined by the covariance is expectation of x minus e x into y minus e y. And if you open up this if you expand this expression this will be x y minus x e y minus y e x plus e x into e y. Now, when I take the expectation inside it will be expectation of x y then this will be expectation x into expectation y then this will be minus expectation y into expectation x plus this. So, one of them the two of these will cancel out each other and you will be left with. So, this is a simpler expression to handle when you are talking of covariance. So, it is expectation x y minus e x into e y this is the definition of covariance and let us see what does it indicate or why do we. So, now if x and y are independent see if x and y are independent then e of x y will be written as e x into e y. So, then e x into e y minus e x into e y will be 0. So, covariance. So, therefore if x and y are independent this implies that covariance x y is 0, but unfortunately the converse is not true that is covariance equal to 0 does not always imply independence of the random variables. And very simple x tell you that the converse of this result is not true. So, independence will always imply that the covariance is 0, but if the covariance is 0 it need not imply that the variables are independent. So, let us see we defining a random variable x which takes three values. So, probability x equal to 0 and probability x equal to 1 and probability x equal to minus 1 is equal to 1 by 3. So, all three are equally likely then I am defining a random variable y which is totally dependent on x. So, y is 0 if x is not 0 and y is 1 if x is 0. So, now if you look at the values of this product this will always be 0 because y is 0 when x is not 0 and y is 1 if x is 0. So, this product will always be 0 if the product is 0. So, the random variable just takes only 0 values. So, then this expectation will be 0 because the variable is taking all possible values as 0. So, this is expectation of x y is 0 and you see from here expectation of x is 0. See x and z having the same p d f and c d f does not imply that x and z are dependent, but we see here that when given x z can only take the values x and minus x we have just seen this and therefore, x and z are completely dependent because what will be the expectation of x. So, expectation x will free. So, this is 0. So, expectation of x is 0 therefore, from the covariance formula this is 0, this is 0. So, the whole thing is 0 and so covariance 0, but we know that x and y are not independent. Yes x and y are not independent you can if you want you can do this way what was the I mean what will you use you will you can show that probability x y. So, because they are discrete random variables so all possible values. So, in fact x y takes all 0 values. So, therefore, here you have to show that how would you want to go about doing it normally for a discrete thing you want to show that for all possible values of this product the probability. So, for all of them is not equal to the product of individual probabilities, but here we will have to write out in detail, but anyway you can as it is there is not much to really prove because the way you are defining your y it is totally dependent on x. So, that gives you the. So, therefore, covariance 0 does not imply independence of the random variables. So, continue with this let us take another example here. Let x 1 be sin 2 pi u and x 2 is cos 2 pi u. So, two different functions of a uniform random variable 0 1. So, u is your uniform random variable on 0 1. We consider two random variables obtained by taking functions sin 2 pi u and cos 2 pi u. Now, let us see if you compute expectation x 1 this will be 0 to 1 sin 2 pi u d u which will be minus 1 by 2 pi cos 2 pi u from 0 to 1 which is 0 because cos 2 pi minus cos pi cos 0 both are 1. So, this becomes 0 similarly you can show that expectation of x 2 will also be 0 and when you compute the. So, therefore, the covariance of x 1 x 2 will reduce to just expectation of x 1 x 2, but expectation of x 1 x 2 will be you see sin 2 pi u into cos 2 pi u will be sin of 4 pi u divided by 2. So, again this is same kind of function from 0 to 1. So, that value will also be 0 right. So, but then x 1 plus 0 because x 2 will be 1 minus x 1 square under root. So, therefore, the covariance is 0 it does not certainly imply independence of x 1 and x 2 which you can see otherwise also because x 1 square plus x 2 square is 1. I will come back to this example in a while. Now, properties some which we can immediately show properties of covariance function. So, this is first of all it does not matter what order you write covariance x comma y is same as covariance y comma x because it is expectation of x minus e x into y minus e y. So, the order is not important then when you take both x and y to be the same then covariance x x because that is expectation of x minus e x. So, square this will be a covariance x. So, therefore, this is equal to variance x and if you take covariance a x comma y then again by definition because a will be here a will be here also you will be able to take it out. So, it will be a times covariance x comma y and then you can apply this principle in general because we have already shown it for this and then since because of this. So, you can show that if you take summation sigma a i x i i varying from 1 to n sigma j varying from 1 to n d j y j then again you are taking all possible products here it will and covariance you can take it because it is expectation function which is linear I can take it inside the summation sign. So, this can be written as this and then this is summation side will be outside and then here this is a i b j will come outside and this will be covariance of x i x j. So, this is the general expression and I will show you a nice application of this after a while how you can use this formula to simplify some computations. Now, the moment you define the covariance function you immediately have define the correlation coefficient rho and we will see the implications and the usefulness of this parameter. So, if x 1 and x 2 are two jointly distributed random variables then the correlation coefficient rho is covariance x 1 comma x 2 divided by variance of x 1 into variance of x 2. Now, of course, this definition is valid only when sigma x 1 and sigma x 2 are finite and in fact, they should not be 0s because if your x 1 and x 2 or any of them is a constant variable taking only constant values that means no randomness about it then your sigma x 1 will be 0 and so you cannot divide by 0. So, this quantity is defined only when sigma x 1 and sigma x 2 are finite and they are not 0 and also in fact, this applies to your covariance function also that expectation x 1 and expectation x 2 should also be defined. So, in fact, I should when I define the covariance function I should have spelled out that the definition is valid as long as your expectation functions exist are defined. Now, you can see you can immediately see that here the covariance function the correlation coefficient can be defined nicely in this way and once you define it this way then it becomes dimensionless because I have standardize the variate x minus e x divided by defined divided by the standard deviation and y minus e y divided by its standard deviation. So, this becomes if you remember how we standardize your normal variate. So, the same thing we are doing and once you do this then this becomes dimensionless. Now, if you want we can you can try it out here because see the covariance you are defining as expectation of x minus e x and y minus e y and so this you are defining it as this. So, here also it is and then for the covariance you are simply taking rho x and then rho y. So, by this definition I can combine I can take this inside. So, there is no big deal I mean I am not doing anything big manipulation here simply taking this inside because we have already seen that the constants can be taken outside or inside does not matter. So, therefore, now this becomes a standardized dimension. So, this is dimensionless definition of the correlation. Now, we use the word. So, if see rho x 1, x 2, 0 obviously is possible when the covariance is 0. So, essentially when the covariance is 0 we use the word that x 1 and x 2 are uncorrelated. So, and you have already seen that two variables being uncorrelated does not mean independence. So, therefore, we have coined this word that two variables are uncorrelated if and only if the correlation coefficient as we call it rho is 0. So, this is our terminology that x 1 and x 2 are uncorrelated provided the covariance is 0 between the two random variables. And we will now through Schwarz inequality and so on I will show you that the number rho measures the relationship again between it tries to. So, covariance simply showed you that whether I mean if the variables are independent then the covariance is 0. Now, here rho gives you much more information than that it will show you that see we will first of all show that rho is less than or equal to 1 always because we have standardize the thing divided by the standard deviations. And then we will show that if rho is equal to 1 then they are perfectly related the two variables. And this actually measures the relationship, but again here we will try to show you that it may not always measure the it may it may predict linear relationships very well, but not non-linear relationships, but so we will come to that anyway. So, this is a very useful parameter and here also I think the same example I was trying to take is that if your x 1 is x and x 2 is x square. So, it is the square of x then see this is the relationship between the two variables x 2 is equal to x 1 square and the covariance will come out to be. So, covariance will be expectation of x 1 cube minus expectation x 1 into expectation x 1 square. Now, if I take x 1 to be a variable which has which takes two values x 1 is 1 and x 1 is minus 1 both the values it takes with probability half then you see expectation x 1 is also 0 and expectation of x 1 cube is also 0 because this will also be 1 into half then minus 1 into half and this will also be 1 into half and minus 1 into half and this will be 0. So, therefore, your covariance is 0 and that will imply. So, this will imply that your rho is 0. So, the variables you are saying are uncorrelated, but certainly they are not independent. Now, another immediate use of this uncorrelated we can show here while computing the variance of sum of two variables and this can then extend it to many more you know when you have sums of more than two variables. So, here for example, the variance of x 1 plus x 2 you will define as x 1 minus e x 1. So, now we will compute the expected value of expected value of x given capital Y. So, this is and therefore, the y values of y will vary. So, when you write this expression computing it through expected conditional expectation of x for different values of y. So, then this will be see conditional expectation of x given y equal to 1. So, that into probability y equal to 1 plus conditional expectation of x given y equal to 2 into probability y equal to 2 plus conditional expectation of x given y equal to 3 into probability y equal to 3 and this because we have made these computations. So, you see that 2.7 into probability of y equal to 1 which is 0.2 from here and then plus 2.88 is the conditional expectation of x given y equal to 2. So, 2.88 into 0.5 which is 2.88 into 0.5 plus conditional expectation of x given y equal to 3 which is 2.833. So, that into 0.3 this is 0.3 and this adds up to 2.82 which is the same as this which we computed from here. So, this is what I want to show you and therefore, here remember even if somewhere in the text sometimes you find that the capital letter is missing whichever the conditional. So, whenever we talk of expectation then the whole idea is that this expectation of x given capital y when I write the random variable here then this is a random variable. So, I can talk in terms of expectation of this random variable and this will be of course, the probability that means the expectation the value of E x given particular value of y. So, you compute this expectation for a particular value of y multiplied by the probability of that particular value of y and then you add up for all possible values of y and then you get this. So, therefore, you can break up the expectation x also in this way. So, expectation x in other words we are saying is expectation and then again expectation conditional y at x. E x 1 plus x 2 minus E x 2 whole square and when you open up the square it will be x 1 minus E x 1 whole square plus x 2 minus E x 2 whole square plus twice expectation of x 1 minus E x 1 into x 2 minus E x 2. So, this is variance x 1, this is variance x 2 and this is covariance x 1 x 2. So, now from here it follows that covariance of x 1 plus x 2 is variance x 1 plus variance x 2 if and only if covariance x 1 x 2 is 0. So, if and only if like if covariance x 1 x 2 is 0 then you get this and if you are saying that this variance is equal to this then covariance must be 0. So, this is if and only if relationship and for this result to be true it is not necessary for x 1 x 2 to be independent. See, earlier when we had talked of independence and then I talked of sum of two independent random variables I had shown you that this will be equal to this, but now we are saying since we have defined this term uncorrelated. So, what we are saying is that for variance of x 1 plus x 2 to be equal to variance x 1 plus variance x 2 it is enough that the two that the covariance is 0 or the variables are uncorrelated it is not necessary for x 1 x 2 to be independent. It is enough if x 1 and x 2 are uncorrelated I cannot write it here, but this is uncorrelated. So, this is one advantage one use of this function we will talk about this some more in the next lecture.