 So today moving on that we will further focus on some more properties of this set of random variables. So today we will discuss the notion called covariance. So when we have a set of random variable we know how to characterize the distributions. We already said that we are going to characterize that through the joint distributions and we also want to understand some of these properties of this joint distribution. For example, if you recall for a single random variable we have defined the notion of expectation and variance but when we have a set of random variables what kind of properties that would be interested in. One thing that easily comes to mind is you have to basically understand how together they behave like both of them are moving. Suppose you have two random variables x and y you want to understand whether both of them are taking in the values in the same direction or they take value in opposite direction or some things of that nature. So let us see if we can try to capture that through some of these notions. We are going to call correlation. So let us say we have let us say x and y are random variables in the same probability space. We are going to define correlation as simply expectation of the product. Then we are going to define their covariance as and the third thing we want to define as correlation coefficient. So also like I am inherently assuming that this random variable x and y are such that their second moments are finite so that their variance is also finite. So if the variance is not finite then this relation here this definition is not so clear I mean not well defined so or does not make sense. So that is why we just assume that their second moments or the variances are finite. So if we have two random variables you can always try to characterize these values if I already know their joint distributions. If I know the joint distribution I can just find the product and using the associated joint distributions I can compute all these things. Now something that immediately comes to mind is suppose x is same as y or let us say y is just x. So this is going to be what correlation in this case expectation of x square that is basically second moment. And now in this case if y is same as x what is this going to be and what is that? So if x equals to y we know this is nothing but covariance of x y is nothing but variance of. So in a way like we are just generalizing these notions when we have multiple set of random variables. Now to understand a bit more about these properties we have this relation what is called as Schwarz inequality. You people know what is the Schwarz inequality on a set of on a pair of vectors. So let us say a is a vector, b is a vector where inner product is going to be norm of a vector and is there a square root there or no? No square root there right. So if I am going to write it as this is going to be what? This is going to be summation a i square and this is going to be summation b i square right. So what as a is a nothing but a vector with components a i's similarly. So there is an analog portion of this for our vectors also. So that says if you are going to look at the expectation of this product then this is going to be a power boundary by expectation of x square. So if you look at the correlation of the random variables x y and if you look at their absolute value that is going to be upper bounded in this fashion. And further if you can show that value t and only if so does not their name directly indicate what they are at least this meaning of correlation is clear like what is we are calling it correlation and what we are doing is taking the product and taking their expectation. How they are correlated and similarly covariance this is at the first time we can take it as an extensions of our variance definition right variance we define for single random variable but now when we have two random variable and if you want to centralize them by removing the mean and then look at their correlation. So for example if this covariance is what this is nothing but correlations of this centered random variables right where you are already removed the mean values. So we will try to interpret what this correlation coefficient means in a moment. So for that we need to understand we need to have this bounds the shorts and equalities okay the shorts and equalities says this we can have a upper bound like this and suppose at least one of this random variable second moment is not 0 then the inequality holds if and only if some random variable can be expressed as a linear is just a scale version of the other random variable with probability 1 for some C I do not know whatever that C but as long as random variable Y can be expressed so X can be expressed as Y by scaling it with some C this is if this holds then it is fine okay now a quick look into why this relation should hold okay so let us take the case of let us try to understand this. So this can be expressed as what expectation of X square minus 2 lambda expectation of X Y plus lambda square expectation of Y square I just just expanded it. So just check that if both X and Y they have finite second moment this quantity is always going to be bounded and further this quantity is going to be greater than or equals to 0. It is going to be greater than or equals to 0 it is clear right because you are squaring and taking expectation but this is also going to be true because you can always found that look for any a and b this relation holds because of that you just expand this and then apply this result in terms and we know that the second moments are fine you can also argue that this is going to be finite. So just verify that and now this is true irrespective of what lambda I choose right as long as lambda is finite this relation the second relation is true whatever lambda I am going to set. Now set specifically expectation of X Y divided by now if you are going to simplify so for time being assumed that expectation of Y square is not 0 and then I can then this is defined properly then I am going to take that value as lambda if I am going to take that lambda and plug in here this is going to be equal to expectation of X square minus expectation of X Y divided by expectation of Y square just by plugging that lambda value and just simplifying you get it and this we already know this is going to be greater than or equals to 0 right. Now if you just going to so I am saying that at least as you one of them is not equals to 0 right I will just first consider a case expectation of Y square is not equals to 0 and then define lambda in that case like this and then plug in. If you want to consider the case expectation of X is not equals to 0 then just do the opposite you just take Y minus lambda X and then define it then replace expectation of Y square by X square you can do similarly. Now if you just do this cross multiplication here and simplify you get exactly what is what we are claiming as shorts inequality fine. Now to look into this again it is easy to see that if suppose let us say first assume the case X equals to C Y for some C if you now just go back and plug Y equals to C X here you see that both right hand side and left hand side match in that case they hold with equality okay. Now assume that this relation holds with equality then you can see that now you have to come up with what is the value of C that works out here right in this case you just going to take that C to be this value lambda whatever you have. In that case also then if the equality holds then that equality satisfied this value of lambda you can just check that okay fine. So with this we have this relation expectation of this correlation the mod of this correlation is upper bounded by this quantity okay now I have this now replace X pi define a new random variable let us say X prime which is going to be X minus expectation of X and then Y prime equals to Y expectation of Y okay do this just to have define two new random variables right and if they have finite second moments so does X prime and Y prime will have finite second moments. Now you just plug in here if I just plug in here what I am going to get expectation of so if you replace X by X prime and Y by Y prime and then here also X get replaced by X prime Y by Y prime then what you are going to get is basically I have just replaced X by X prime and then introduce the definitions we have so now if you look into this what is the quantity on the left hand side I have this is expectation of centred X and Y right this is what we call this is according to our definition but covariance and what is this quantity over here expectation of X minus expectation of X whole square this is nothing but variance of X and then we have variance of Y. So now with this relation what we can write away say about this row X Y so I know that row X Y is going to be less than or equals to Y okay now let us try to understand what is this correlation coefficient means so what we are trying to like as I said we have a two random variables in this case and we have all these properties what we understand is how they behave here like is there anything we can infer when we look them jointly. So as of now our correlation coefficient the way we are defined we have shown that that is going to be less than or equals to 1 and now let us understand so this is like mod here right because I had a mod here and so this is what I am saying is absolute value of row X Y is less than or equals to 1 that means my row X Y will be between 1 and minus okay suppose if I find out that this correlation coefficient happens to be more than it is going to be less than 1 but I also know that that it is going to be more than 0 that is my row X Y happens to be positive what does that mean so let us take a simple I am going to take two events A and B okay to some two events that are coming from my script F now I am going to define a random variable X which is going to be 1 if A occurs 0 otherwise I can define a random variable like this right whenever event A happens I am going to take the outcome to be 1 if anything other than this happens I will simply say 0 and I am going to define another random variable if B occurs and 0 otherwise. Now I want to understand suppose is there anything like if B happens already happened subsequently or whatever like I want to understand given that B has happened does it have any bearing on happening of A does it make happening of A more likely or less likely or that what kind of information it reveals to me so let us try to understand by evaluating this correlation coefficient here so covariance of X Y I have written the formula here but if you are just going to cross multiply the terms within the square bracket and expand them it simplifies to simply okay just simplifies you just do the multiplication and expand now I am going to take this I am going to apply this formula on these two specific random variable I have defined here okay what are the possible values the product X Y can take it can only take 1 or 0 right it is going to take 1 1 1 occurs right and then what is going to be the expected value of X and Y probability that X equals to 1 and that is the only case when the value is 1 in all the case are there is going to be 0 that is why expectation of X Y is nothing but this probability into 1 but I am skipping that 1 here now what about this I got an expectation of X this is going to be simply X equals to 1 and Y equals to 1 right because I am talking about expectation right so how you are going to complete the expectation expectation is value of this and the corresponding probability probability I return the expected value value of this random variable with this probability is 1 right that is exactly the expectation value all other terms are going to be 0 now suppose let us say assume assume the covariance is going to be positive if this is positive let us say if and only if because this is an equality this guy is going to be positive right and this is positive and this guy is positive implies now focus on this part so this is nothing but probability that X equals to 1 Y equals to 1 probability what I want here is X equals to 1 is greater than probability that Y equals to 1 and now if you apply the conditional probability definition here what is this Y given X right now go back and let us go back and plug in the meaning of this X equals to 1 means A occurs Y equals to 1 means B occurs right now what it is saying is suppose let us say X equals to 1 that is A has occurred now what is saying that the probability that B has occurred is more than just the probability of even B itself occurring that means if A has if you know that A has occurred that B is also occurring seems to be higher than just if you are going to ask the question whether B has occurred so what it is selling the covariance is selling that if the covariance is positive that means one event happening already gives it says that the other event happening conditioned on this is more likely than the under unconditional probability and this is what if you are going to take this to be positive if you are going to take this to be negative what is that it is going to be opposite right that means the condition that A event has occurred the probability that B is occurring is now going to be less than the unconditional probability of B happening itself so in a way what is happening if covariance is positive that means one event happening has increased the likelihood of other event also happening together so that is the meaning of covariance here yeah yeah so if you want to so if X and Y are independent this is going to be 0 right so if independence means so if it okay so now let us come to the case like equals to 0 covariance XY equals to 0 if covariance XY equals to 0 means what expectation of XY is equals to expectation of X into expectation of Y when this expectation of XY is going to be equal to expectation of X into expectation of Y when X and Y are independent right so one so in that case you if X and Y are independent you already and understand that happening of one will not reveal any information about the other only when they are other than independent maybe one will give information about happening or not happening about the other thing so as you already see independence implies so we are going to say that when the covariance are going to be 0 of X and Y we are going to say X and Y are uncorrelated okay when the covariance is 0 so independence implies uncorrelated but is it is in general not true that the other direction is true uncorrelation if two random wells are uncorrelated it not mean that they are independent so you think about examples where that is going to happen I will just leave it as an exercise for you okay we already said that when covariance is 0 we are going to call uncorrelated independence implies uncorrelated and then we already said that when X equals to Y covariance implies variance of the random variable X now I am going to quickly list some properties you can verify yourself that just like so if you recall when we had expectation of defined expectation we have defined various properties of expectations right like expectation is linear expectation of a random variable if you scale expectation also just scales all these properties so similarly we can write properties of covariance so if you have covariance of X plus Y and U plus V so now you are looking at two random variables which are themselves expressed as some of other two random variables right so this is one random variable which is expressed as some of these two random variables this can be expressed as covariance in terms of the covariance of this pair of random variables like this X U plus covariance of X comma V plus covariance of Y comma U plus covariance of X comma and similarly covariance of AX plus B CY plus D okay so here again I am looking at covariance of two random variables but each random variable is now affine function of another random variable okay now how what will be the covariance of this covariance of this is going to be simply A into C a covariance of X comma Y so it does not matter what is the value of B and D here the intercept values in this affine function did not matter all you need to do is what value you are scaling this value the constant offset is not going to affect your covariance matrix covariance like this okay so often you will be you will end up with some of random variables so for example think of 5 courses and the number you are going to get in each course is like a random variable let us call X 1, X 2, X 3, X 4, X 5 and you want to decide what is the total sum of the marks I am going to get across this 5 courses so in that case you are going to define some random variable like this let us say X 1 plus X 2 all the way up to X m and for this you want to basically calculate let us say the variance of the sum across your 5 courses scores so how you are going to compute this one thing is is it variance of X 1 plus X 2 is equals to variance of X plus variance of X 2 we know that expectation of X 1 plus X 2 is expectation of X 1 plus expectation of X 2 but we never say variance is also linear right expectation is a linear operator but not variance now how you are going to calculate the variance of this we have no by definition variance is nothing but covariance of S m, S m now does this help if I am going to write like this does it help why it helps because then I can go and explore this property right and if I can explore then I know I have to only worry about a pair of random variable at any time so if this is the case you can just go and expand all these things what you will end up is basically variance of X i i equals to 1 to m then equals to j of covariance of X i and X j you can just I am just like simplifying this after you are plugging this covariance term with this S m defined like the sum of random variable so here I have defined only for 2 random variables in the sum right but you can expand this if you have more than 2 here suppose let us say you have 3 here how you have how you are going to apply this formula to get it work for this sum of 3 random variables so you can initially treat these 2 as 1 random variable then we have only sum of 2 random variable apply the formula and then expand on this I mean each pair you can group so that you are dealing with only sum of 2 random variables and expand that group okay so by doing this you will end up with this formula please verify this okay suppose I say that the sum the scores you are going to get in this five courses are independent okay it does not one course is going to score of one course so then how does this formula simplifies so then if all of them are independent this is going to be so is it necessary that in this case I need to tell you that the scores are independent or is it sufficient if I say that they are pairwise independent so we know that like pairwise independence is a weaker notion than independence right across the set of random variable so what I need here I need a independence or just a pairwise independence pairwise independence will make this term 0 in this case it is simply going to be some of the variance of each of your random variables okay fine and if I say further that all of them are identically distributed that means they have the same same x but they are just like multiple copies if I say they are all identically distributed then you can just write it as like m times the variance of x1 because these are all going to be the same value