 In this lecture I shall be talking about basically covariances and some of their properties but then before that in order to introduce covariances I will start with some of the basics of statistics okay. Suppose x is a random variable taking values let us just say x1 to x1, x2, x3 etc infinitely many with probabilities p1, p2, p3 etc where naturally 0 is less than or summation p i is equal to 1 to infinity is equal to 1 then the mean of random variable mean of x which is denoted by expected value of x and this is equal to summation i is equal to 1 to infinity xi multiplied by p i this I suppose all of you know it okay then the variance of x which is expected value of this this is equal to so basically summation i is equal to 1 to infinity p i into xi-1. Here please look at it you should write here a value of x a value of x is xi-the mean-the mean the whole square the whole square expectation means this one multiplied by the corresponding probability so this one multiplied by the corresponding probability p i i is equal to 1 to infinity this is for discrete random variables but for continuous random variables here x is a random variable with probability density function probability density function say f then mean mean of x which is same as expected value of x which is equal to integral x fx dx integration over suppose x is a positive random variable that means for minus infinity to 0 the density is anyway 0 from 0 to infinity it is going to be positive okay so even if it is a positive random variable or even if it is not taking some values in this interval minus infinity plus infinity one can always write minus infinity plus infinity because the place where the random variable is not taking the values the corresponding density will be anyway 0 so I can always write minus infinity plus infinity then again the variance variance of x is equal to the formula is the same thing this is equal to integral x-okay now there are some nice properties what are the properties first this is whether you have a discrete random variable or you have a continuous random variable the properties that I am going to write they are always true what are the properties let me write down properties like this expected value of ax plus b is equal to this means we have a new random variable y which I am defining as a times the random variable x plus b where a and b are constants so the mean of this random variable e is a times the mean of the old random variable x plus b okay now to variance of ax plus b is equal to a square variance of x variance of ax plus b is equal to a square that means the b does not have any impact whatever b you take it does not matter it will be a square variance of x okay now now let us have this covariance so for covariance you need to have two random variables x and y okay and you also need to have a joint probability density function so there are two random variables x and y x and y are two random variables and you have the joint probability density function f x y then the covariance of x y is equal to basically expected value of this is equal to let me explain note that these small x's and small y's these are the values taken by the variable and the variables are represented by the capital letters capital X and capital Y and this thing expectation of x it is a constant value expectation of y this is a constant value so covariance of x y is expected value of x minus expectation of x into y minus expectation of y so this is like this this the product of x minus expectation of x into y minus expectation of y and you should write the density function because expectation you are taking and with respect to x and with respect to y dx dy and this is over the whole thing and you can have similar definition for discrete random variables also which I am not writing now well what will be covariance of x with x this will be right and this will be which is nothing but variance of x this is nothing but variance of x and there are some more formulas what are the formulas say covariance of let us just say x plus y with z is nothing but covariance of xz plus covariance of yz is this true why just apply this one expected value of x plus y minus so x plus y minus expected now expectation of x plus y is expectation of x plus expectation of y so minus is this and multiplied by z minus expectation of z so this is nothing but expectation of you see x minus expectation of x into z minus expectation of z then y minus expectation of y into z minus expectation of z so x with z and y with z expectation of plus expectation of so covariance of x plus y with z is equal to covariance of xz plus covariance of yz covariance of x with a constant what is that covariance of x with a constant is 0 again you apply the formula x- expectation of x constant is a constant is a the mean is also a-a is equal to 0 so multiplication of that thing is 0 so you are going to get 0 so covariance of x with a is 0 and covariance of ax plus b with cy plus d this will be first before that covariance of ax with y is equal to a times covariance x with y again you apply the formula ax means ax-a into expectation of x so a is going to come out so x- expectation of x into y-y expectation of y so covariance and a has come out so a times covariance xy and covariance of ax plus b with cy plus d is equal to a c covariance xy a times c covariance xy now suppose we have three variables let us just say we have three variables x1 x2 x3 okay and suppose x2 is equal to 2x1-1 x2 is equal to 2x1-1 okay now what is now this matrix let us just see covariance x1 x1 covariance x1 x2 covariance x1 x3 covariance x2 x1 covariance x2 x2 covariance x3 covariance x1 x1 is variance x1 covariance of x1 with x2 x2 is 2x1-1 so this will be two times variance x1 x1 with x1 is variance x1 x1 with x2 x2 is 2x1-1 so this with x1 so one times two times variance x1 and covariance x1 x3 let me just write it like this whatever it is there now covariance of x2 with x1 that is two times right and x2 with x2 this is four variance x1 and x2 with x3 x2 is 2x1-1 this will be two covariance and anyway there is a third row now what can you say about the first two rows dependent so what will happen to the determinant 0 the determinant value will be 0 so because of just this relationship what is what is happening whatever may be your covariance matrix if two variables are linearly related the rank of the matrix is going to be decreased and it will not have full rank it won't have full rank that you can very clearly see and the determinant will become 0 okay and the determinant will become 0 and I am not writing these things this is this will be covariance x1 x3 and what will be this this is two times this will be variance x3 it is not really important I am only interested in the first two rows I am only interested in the first two rows so if two variables have linear relationship it is going to affect the whole of the covariance matrix have you understood it now it is going to affect the whole of the covariance matrix so that as you can see the rank here these two are linearly dependent rows these two are linearly dependent rows so that the determinant is going to become 0. So you are forced to reduce you are forced to do feature selection in this case to make the determinant non-zero have you understood what I wanted to say if you get a matrix covariance matrix with determinant at 0 okay somehow you should find out which one is doing it or which one is making it so that that particular linear combination or whatever it is somehow you want to remove it so that you will make it the determinant non-zero somehow you have to find the largest minor for which determinant is non-zero largest minor you need to find out I hope you are understanding the terminology that I am using you understand the word minor you are supposed to find the largest minor and that will actually tell you the number of independent variables or it will actually tell you how much rank at most you can make it so you should remove the rest of the things somehow you should find them to be you should find those things and you should just remove them because those variables are going to act as noise any question please given this covariance matrix we can say which two variable are related feature which two features are related look at this one this variance x3 this is somehow not taking any part here so somehow x3 you should take it out you see you look at this 2 by 2 matrix this row you multiply by 2 you are going to get this look at this with this again this row you have to multiply it by 2 you will get it look at this with this this row you multiply by 2 you are going to get this this sort of thing you would not get if you are looking at this 4 variance x1 2 covariance x1 x3 2 covariance x1 x3 so you are supposed to get just to variance I mean covariance I mean this one you have non zero determinant probably this this this this also has non zero determinant are you understanding this this this this non zero determinant this this this this also non zero determinant but you have here zero determinant because this is this row if you multiply it by 2 you are going to get this so since these two are these two rows are dependent so here some variable should be removed on the other hand this this this this is non zero determinant this this this is also non zero determinant so here you should keep one variable and from here you should remove one variable so naturally the variable that you are going to keep is x3 the variable that you will be removing is something from here non linearly related no no no no no see all these things they are linear have I written anything for non linear if it is non linear have I made any statement about what may be happening no so about non linear relationship I am not I do not want to make any statement on this I am we are only looking at linearly linear case that dependency or independence they are also linearly independent or linearly dependent okay these vectors are said to be linearly independent that is the definition that we read similarly they are said to be linearly dependent if they are not linearly independent so I am not looking at the non linear part of it and if it is non linear how to go about doing it at least I mean I cannot make any statement using these things that is a real serious problem which people are really aware of it but they are not been able to find solutions acceptable to everyone okay they have not been able to find solutions acceptable to everyone I am only talking about linear case yeah now any doubt I can say that you find the largest minor or that largest square matrix with non zero determinant with non zero determinant saying it is easy but then you have to look at all those combinations which may be difficult okay but then the principle I I hope you have understood it if your variance covariance matrix has the value determinant as zero determinant value as zero somehow you should find out those variables which are causing this zero and you should remove them okay but how are you going to do it that I think the principles are clear to you but then you should act how are you going to implement it which are I mean you would like to implement it in such a way that it is feasible to implement it I mean you do not want to do an exhaustive search then which will make your life miserable that I think one should try to find out some methods for this it is not real I mean there are some things I think they are already existing in the literature the movement you have your covariance matrix has determinant zero okay there are some papers which you can find the literature about how to go about removing those features so that you would like to make it non-zero so this one I think I will stop here