 Last class, we were discussing variance and covariance. So, the covariance of two random variables is defined as expectation of x y minus expectation of x times expectation of y whenever it is well defined. And we said that the random variables x and y are said to be uncorrelated if covariance of x y equal to 0. And we were in the middle of proving a theorem saying which says that if x and y are independent then they are necessarily uncorrelated. But the converse of course is not true we gave an example uncorrelated random variables may be dependent they need not be independent. So, I will state that the theorem I think I did not state it in very precisely. So, I will state the theorem more precisely today. So, you take this statement rather than the previous one and I was in the middle of the proof, but let me state the theorem properly if x and y are independent random variables with expected absolute x is than infinity and expected absolute y is than infinity. Then expected x y exists and covariance of x y equal to 0. So, this is the statement of I did not I stated it in plain English I think yesterday. So, this is the proper statement. So, all you need here is expectation of absolute x and expectation of absolute y being finite. Then in that case you are guaranteed that expected x y exists and in particular expected x y will be equal to expected x times expectation of y. So, it is generally not the case that if x and y have well defined. So, in this sense finite expected value it is generally not the case that expected x y is well defined, but if they are independent it is automatically well defined and in fact they are uncorrelated. So, this is the statement proof we started out saying proof we did it for a simple case. If x and y are simple we wrote. So, we had this kind of a representation a i a i o i big a i and y is equal to sum over i j equals 1 to m b j i b j. Then we wrote out x y as the double sum a i b j indicator a i intersection b j and then we argued that a i and b j are independent for all i and j. That was because x and y are independent random variables and then we were able to prove that x and y are in fact uncorrelated. Now we have to extend it for any non negative random variables. So, this is previous class. So, this was done. So, next suppose x and y are non negative. Now, what do I do consider a approximating sequence? So, consider let x n and y n be simple sequences of simple random variables such that x n increases to x and y n increases to y. Such an approximation always exists we know that. We can explicitly find such an approximation from below. So, I am obviously heading towards an application of the monotone convergence theorem. So, obviously then x n y n increases to x y agreed they are all non negative and increasing. So, x n y n is also increasing and it approaches x y correct. We have expectation of x y we have expectation of x y is equal to limit n tending to infinity expectation of x n y n y m c t. But, x n and y n are so wait. So, x n y n. So, the way you can do it is x n y n in your approximation are independent also. Because in if you just specifically take the you remember the how we constructed if you given any measurable function non negative measurable function g. The way you construct g n the approximating sequence you can show that x n and y n are also the approximating sequence are also independent of each other. Since x n and y n are independent by construction right this is what I mean right. So, with the way we constructed the approximating sequence you can verify that they are in fact x n and y n are in fact independent expectation of x y equal to. So, this will be limit n tending to infinity. So, you know that for simple random variables you shown that independent random variables are uncorrelated previous lecture. So, this will become expectation of x n times expectation of y n correct. Now, so this limit. So, this limit can be this is obviously limit expectation of x n time limit expectation of y n. Because both the limit exist right. Why do both the limit exist? Well it is actually because of m c t right. So, again by. So, you apply m c t. So, this is limit this time limit that right. So, the limit of a product is the product of the limits if when all whenever all limit exist, but here all limits do exist right. So, that will be equal to. So, this is by m c t again this will be expectation of x times expectation of y any questions. So, you proved it for non negative random variables. So, I have not assumed here. So, if you notice I have proved this is a very general proof right. I have not assumed that x is discrete or y is continuous or any such thing right. I have not assumed joint density I have not assume p m f right. It could be a very any general random variable x and y whenever they are independent they are uncorrelated. Well I have proved it for the non negative case when x and y are both non negative. If I have to deal with random variables that are possibly negative then I have to use x plus x minus right. More generally we write x is equal to x plus minus x minus y is equal to y plus minus y minus and complete the proof. This is there in your lecture notes. So, I am not writing that down. You explicitly compute write x is x plus minus x minus y plus y minus y minus expand the whole thing out and then you can prove that. So, when x and y are independent x plus and y plus x plus and y minus they will all be independent right. You have to that you can show easily right because in fact that is a question in your quiz right. After all what is x plus x plus is max of x comma 0. Suppose you want to show that x plus and y minus are independent y minus is minus min of y comma 0. They are both functions of x and y. So, they must be independent right. So, all these guys will turn out to be independent x plus y plus x plus y minus x minus y plus x minus y minus right and then invoke this for non negative random variables and you can get it. Just write out the whole write out this product and you will get it. That is where you have to invoke this. If this were not true then you have some infinity minus infinity problems. Any questions? So, I have next going to talk about the variance of sum of two random variables right. So, we know that. So, if you have x and y are any random variables not necessarily independent or uncorrelated or any such thing. We know that expectation is always linear. The expected x plus y is always equal to expectation of x plus expectation of y. That is just linearity of integrals right. But the variance is not necessarily always you cannot always say that the variance of a sum of two random variables is sum of variances right that is not true. So, I will just state a proposition that says x and y are random variables. Then sigma x plus y squared is equal to sigma x squared plus sigma y squared plus twice covariance of x comma y. So, if you are interested in computing the variance of a sum of two random variables. Then you have to sum the two variances and add twice the covariance. But if x and y are uncorrelated you can just add the two variances. So, if x and y are in particular independent you can add the variances and say that is the variance of the sum. Similarly, for n random variables how do you think this will generalize. If you had sum of n random variables you will first sum all the variances plus you will have all the possible covariance twice all this covariance x i h j for all i less than j right you have to sum it like that. So, it generalizes in a straight forward manner. So, this proving this is very easy right you can just write out. So, what is this this is after all expectation of. So, this is expectation of x plus y squared minus yeah. So, let me do it like this minus expectation of the expectation of x plus y the whole squared right great. Now, this is equal to this. So, this is equal to that that is equal to you can just expand this guy out right. So, expectation of x squared plus y squared plus 2 x y minus expectation of x squared minus expectation of y squared well. So, what I mean here is expectation of x plus y the whole squared. So, this is expectation of x plus expectation of y. So, then I will get minus 2 expectation of x expectation of y right. So, then I have what I want right. So, this and that will combine to give you variance of x that and that will give you variance of y. So, plus 2 expected x y minus 2 expected x times expected y right. So, this is this will be equal to that right. If x and y are uncorrelated you will have this term will cancel with that term. So, you will have just the sum of the variance. So, that is just a minor point. So, generally if you are given independent random variables or uncorrelated random variables you can add the variances and say that is the variance of the sum. But, generally you cannot do that you have to have all these covariance terms. So, if covariance of x y is something positive the variance will only be greater right. But, if it is a negative correlation the variance will actually be less than the sum of the variances. Variance of a sum will be less than the sum of the variances if they are negatively correlated. So, that is just a minor thing right. I want to make another definition the correlation coefficient between x and y between x and y or y and x it is all the same is defined as o x comma y is equal to covariance of x comma y divided by sigma x sigma y. These are the standard deviations the square root of the variances or you can take it as covariance of x y by square root of variance x square root of variance y. So, this is called correlation coefficient between x and y. So, in some sense this is a scaled version of the covariance. So, it is like saying. So, this is somewhat scale invariance. So, if x and y you are measuring x and y in let us say kilo grams are supposed to grams right you will get different answers for the covariance. But, rho x y will be the same no matter what units you measure x and y n it is like a scale free thing the most important thing about this rho x y again rho x y can be positive or negative right. So, these guys are always the denominator is always is always positive of course, assuming all this is not 0 right is not defined when the standard deviation is 0. But, if the standard deviation is 0 then the variables are themselves the random variables are themselves constant right. So, except in that case this is well defined and this is positive and covariance could be positive or negative right. So, rho x y could also be positive or negative it can in fact be 0 also right. So, the important result about this correlation coefficient is it is a number that lies between minus 1 and 1 always. So, in that sense it is a scaled down it is a very it is a nice way to represent the covariance the correlation between x and y. Because, it is a it is a kind of scale free kind of a thing theorem is Cauchy Schwartz inequality for any two random variables x and y minus 1 is less than or equal to rho x y is less than or equal to 1 further. So, this is. So, this is saying that the correlation coefficient lies between minus 1 and plus 1 both inclusive further more we can in fact say that whenever the correlation coefficient equals plus 1 or correlation coefficient equals minus 1 there is a deterministic relationship between x and y further. If rho x comma y equals 1 then there exists a greater than 0 such that there exists some real number a greater than 0 such that a y minus expectation of y equal to a times x minus expectation of x almost surely. If rho x y equal to minus 1 then there exists a less than 0 such that all that. So, the theorem clear. So, this is saying that correlation coefficient is between minus 1 and 1, but whenever it is equal to 1 then y can always be written as some a x plus b for some constants. You can think of just this as a x plus some other constant and similarly if rho is minus 1 then y is equal to a x plus b where a is negative. So, there is a deterministic relationship between x and y whenever the correlation coefficient is plus 1 or minus 1. So, when that is not case that is not the case it will be strictly between 0 and 1. So, this is called Cauchy Schwarz inequality another way of saying that is absolute covariance of x y is less than or equal to sigma x times sigma y. Another way of saying the Cauchy Schwarz inequality is to say that absolute rho x y is less than or equal to 1. In other words absolute covariance is less than or equal to sigma x times sigma y. So, this Cauchy Schwarz it has a very short proof, but you have to know it there is it is a little bit difficult to I mean you have to know how to proceed with it, because I mean I am just going to copy what I have here. So, you have to do the it comes in 2 steps, but you have to know how to do it. x tilde equal to x minus e x and y tilde. So, it is easier to work with the centered versions of these random variables I am just taking away the mean. So, that these guys are 0 in expectation 0 mean random variables and all this is a little bit easier, because then you will have y tilde equal to a times x tilde and so on. So, with 0 mean random variables all this is very easy let y tilde equal to y minus e y. Now, I want to show consider this this is what you have to start with. So, with this definition expected x tilde squared will be variance of x and similarly expected x tilde y tilde will be covariance of x y. So, you consider expectation of this big expression expectation of x tilde minus expectation of x tilde y tilde over expectation of y tilde squared times y tilde this whole squared. So, I mean. So, this is actually just a constant this whole thing is just a constant and in fact, what is this. So, this is like covariance over sigma y squared is not it. So, I am just considering this. So, I am essentially considering x tilde minus some constant times y tilde and I am considering the squared expectation of that. Now, this is certainly non negative why because what is inside is a non negative random variable. So, this is certainly non negative. Now, it turns out that if you expand this out you get course is worth in equality. So, let us expand this out well let us see. So, this is this will be like expectation of. So, this is work out. So, this will give you expectation of x tilde squared. So, if you do this you will get it. So, it is a big messy operation. So, you will have this squared times expectation of y tilde squared. So, should we just do this quickly x tilde squared plus expectation of. So, this will squared is not it will square. So, I will get expectation of x tilde y tilde squared over expectation of y tilde squared. I will have expectation of y tilde squared whole squared, but then that will cancel with expectation of y tilde squared coming from here. I am taking x square plus y square minus 2 a b if you like. So, then I will have a minus 2 a b term that is minus 2 x tilde well. So, I made a mistake here. So, finally, I think if I do that. So, far I am I think. So, then I will put an expectation here minus 2 expectation of x tilde y tilde right. So, that much times expectation of x tilde y tilde over expectation of y tilde squared. I think this is correct any mistakes fine. So, I am taking expectation of x tilde squared plus expectation of y tilde squared times this whole squared. So, that is what this is. So, you will have 1 cancellation here minus 2 expectation of x tilde y tilde times that constant. So, I have. So, I have this guy squared now right. So, this works out to be I think this works out to be what I want right this is. So, I will have. So, for 1 for 1 of this will cancel this 2 right I mean this is exactly the same as this is not it except for the 2. So, I can just get rid of I can just get rid of that guy and get rid of the 2. So, I think you will agree with that. So, I think that that is all I have that is all I want right. So, this is greater than or equal to 0 right which means I have bring this to this side you have expectation of expectation of x tilde y tilde squared is less than or equal to expectation of x tilde squared times expectation of y tilde squared correct. So, this is. So, which means the absolute. So, if you take square roots positive square roots on both sides you get absolute covariance less than or equal to sigma x times sigma y right. Because, this is sigma x squared and sigma y squared right. So, you would have prove in the inequality. Now, how would you how do you prove this bit? So, this is greater than or equal to 0 is for this. So, this is greater than or equal to 0. So, now you have. So, if you have absolute rho x y equal to 1, then which that means that this is met with equality correct which means that is met with equality eventually correct. So, I have the expectation of a squared this whole thing squared equal to 0 which means x tilde must be equal to some constant times y tilde which is what you wanted to prove correct. In fact, you can identify the constant A as that guy correct. Is that clear the second part of the theme? So, essentially we are saying that this random variable inside this squared must be 0 almost surely. So, which means almost surely y tilde x tilde must be a constant times y tilde end of story which is what we have which is what we wanted to prove right or may be I will have it the other way right. I will have x tilde equal to constant times y tilde which is same as saying y tilde square some other constant times x tilde. Are there any questions? So, I am getting to that. So, the question is in you know for. So, you know that if you have let us say vectors in Euclidean space. There is a standard definition of inner product and right inner product x y between 2 vectors and you know that for 2 vectors in Euclidean space there is this Cauchy-Schwarz inequality which says absolute inner product of x y is less than or equal to square root of norm x times square root of well norm x times norm y right yeah norm x times norm y. So, this has a very similar flavor to it right in particular what happens is that this covariance of x y plays the role of a inner product and this in some sense the sigma x and sigma y are playing the role of the norm right except these x and y are not vectors in R and they are random variables correct. So, this covariance satisfies all the properties of a inner product namely. So, what are the properties that you want of a inner product? You want symmetry right inner product x y should be equally inner product y x right. So, that is clearly true covariance x y and covariance y x are equal right by definition they are equal right and then what do you want you want bilinearity right bilinearity. So, if you have inner product x plus b y inner product with z you want to become linear which is also true for covariance right and finally you want. So, and then you want so then you can prove Cauchy Schwarz holds right Cauchy Schwarz obviously holds there is one little thing that remains you want inner product x with itself is always greater than or equal to 0 correct and equality holds only if x is equal to 0 that is a property of inner product that is one of the defining properties of an inner product correct. So, what is if I claim that covariance of x y is in fact behaving like an inner product I want to establish that covariance of x comma x is greater than or equal to 0 correct. So, what is covariance of x comma x yeah covariance of x comma x is simply the variance of x is not it. So, that is greater than or equal to 0 no problem but if the variance equal to 0 you have to show that x is equal to 0 is that true x is equal to. So, x is equal to you cannot say that x is equal to 0 you can say that x is equal to constant, but even x is equal to constant you can only say for almost surely right. So, there is a little bit of a there is a little bit of a problem here. So, for one thing you can do is you can just consider 0 mean random variables right just consider the you can think of this x and y is all 0 mean random variables. If you have non 0 mean you just subtract it off right with the finite mean as long as it is finite you just subtract it off. So, you work with only 0 mean random variables then this problem kind of goes away right. If you say that covariance of x comma x is equal to 0 that means x is equal to no 0 almost surely because you are all 0 mean random variables, but again you have to have this problem it is not really 0 random variable it is it could be non 0 in some set of probability measure 0 right. So, ultimately you have to consider the equivalence class of random variables which are which agree on except agree everywhere except perhaps on a 0 measure set right. If you are willing to consider that equivalence class as the 0 random variable right if the set of all random variables are equal to 0 almost surely as if you identify that as the 0 random variable the equivalence class of 0 random variables then you have a then everything is fine right except for this little adjustment then this is fine right. Then you will in fact have that covariance of x y place the role of a inner product in the space of 0 mean random variables correct. And these guys are like norms right and then you have a Cauchy Schwarz inequality holding and what is. So, what would you identify rho x y as. So, what is inner product x y is it the cosine or cosine square the cosine of the angle between the vectors right. So, this is like the. So, rho x y will be like the cosine of the angle between these 2 random variables you can think of this is. So, x and y are you can think of them as some vectors although they are random variables. So, this rho x y has the interpretation of cos theta in some loose sense right. So, in this can be made more precise everything I have said can be made more precise. So, if you consider the class of random variables with 0 mean and if you are willing to identify this almost surely constant random variables as simply 0 then you can prove that these 0 mean random variables with of finite all this well defined right. For example, variance should be finite and so on right whenever variance is finite this forms Hilbert space have you heard of a Hilbert space. So, a Hilbert space is basically a complete space with a endowed with a inner product it is an you know I told you it is a endowed with a inner product and it is also a complete space because it has I mean the values these x and y take values in a complete space right. They are in particular they take values in R in R case right. So, you can prove that these finite second moment random variables with identifying this equivalence class to 0 right form a Hilbert space it is called L 2 L 2 random variables. They have finite second moment and in this L 2 space which is a Hilbert space this this play this row x y place the role of cos theta and cos theta equal to 0 means x and y are orthogonal right. So, uncorrelated random variables are like your orthogonal vectors. So, this interpretation is quite useful in many areas I mean in particular in estimation theory this place a this interpretation is very important. So, I did not I mean I did not make this very rigorous, but I just spoke it out, but I hope you are roughly with me on this. This can be made much more rigorous are there any questions. So, you can remember so what the take home is that there is this finite mean random finite second moment random variables these well defined this is there lie in the space called L 2 which is a Hilbert space and you can define in a product cosine all that is you can do I will stop here.