 Welcome back, today we will discuss variance and covariance. So, the variance of a random variable x is defined as the expected squared variation around the expected value. So, it is assumed that expectation of x exist. Suppose, suppose expected x exist and it is finite. And then you define the variance of the random variable x as the expected square variation around the mean, around the expected value. And it is denoted by sigma x squared and sigma x. So, sigma x square is variance and sigma x is called standard deviation. Standard deviation is used more in statistics and variance is used more in probability. So, you consider x minus expectation of this is just a number. If it is finite, this is just a number if this is finite, this is x minus a constant whole squared. So, that is random variable the expectation you take. So, since this is a non negative random variable its expectation is always well defined. So, variance is as long as expectation of x is finite this variance is always well defined it might be infinity. But, it is always well defined there will be no problem of x plus x minus and both being infinity that is because this what is inside is a non negative random variable. So, this is always well defined could be real number it could be plus infinity. So, this intuitively this variance gives you. So, a measure of how much the random variable deviates from its expected value. So, a large value of the variance means that the random variable takes values which are spread around the expected value considerably. And if the variance is small it means that the random variable takes values close roughly close to the expected value with higher probability. That is roughly very loosely speaking that is what it implies that what it signifies this variance. So, clearly this variance is non negative that must be clear this will be clear just by the definition. Now, can the variance be 0 yes if x is almost surely constant if x is a constant with probability 1 then its expected value is equal to the constant. And therefore, the variance will be 0 in fact the converse is also true if the variance of a random variable is 0. Then you can state that the random variable is constant with probability 1 variance of x is equal to 0 if and only if x is almost surely constant. See if x is almost surely the if part is easy if x is almost surely constant then the random variable let us say x is equal to some constant c with probability 1 then expected x will also be equal to c. So, then it is clearly you are just taking expectation of 0 squared then you will have. So, with probability 1 this difference will be 0. So, you will have variance is equal to 0. So, if part is easy. So, if part is easy. So, let us do the only if part. So, the only if if expectation of x minus expectation of x whole squared equal to 0 then integral x minus expectation of x correct is what it means this is the definition of this expectation. Now, you have to invoke a property of the integrals saying that if you have a non negative function whose integral is 0 then the function is 0 mu almost everywhere. Now, that mu is simply p right. So, this is. So, using property number can you tell me what property number is it the in the property of integrals right. So, using property. So, using property whatever right. So, you tell me what property. So, we denote that we see that x minus expectation of x whole square equal to 0 with probability 1 which means x is equal. So, on a set of probability 1 this x minus expectation of x whole square is 0 which means x minus expectation of x is 0 with probability 1 right. So, you fill up this property I do not know which property number it was 6. So, that so that settles one thing right. So, variance is always non negative and it is 0 if and only if the random variable is almost surely constant. So, another the expression for variance which is easier to compute can be obtained by simply expanding out the square. This is something you may be familiar with. So, sigma x square you can write it as expectation of just expand this guy out get x square plus expectation of x the whole square right minus 2 x expectation of x right. I am just writing a minus b whole square as a square plus b square minus 2 a b. Now, I will invoke linearity of expectation ok. So, I have 1, 2, 3 random variables this guy is just a constant it is not even a random variable right this is just a constant. So, I will have expectation of x square plus expectation of x square minus. So, this guy is a constant again it will come out of the expectation right. So, you will get expectation of x square plus expectation of x the whole square minus 2 expectation of x times expectation of x right. So, I am bringing this. So, you think of this is a constant and you bring it out of expectation right that is a property of expectation also. So, this is equal to expectation of x square minus expectation of x square. So, that is often easier to compute rather than evaluate. So, if you are given some distribution some exponential or something like that this is a little tedious to apply. This is easier to apply you first compute the expectation of x compute the expectation of x square separately and then subtract the expectation of x square and expectation of x the whole square. So, since variance is always non negative you know that. So, this guy is non negative right. So, that means for every random variable the expectation of x square is always bigger than or equal to the square of the expectation. So, you take the square the random variable first and take expectations you will get something that is bigger than or equal to taking the expectation and squaring it. And in fact, if expectation of x square equals expectation of x the whole square then you can conclude that x is constant almost surely. In fact, you did you do Jensen's inequality in your homework since it is in the homework that is coming up right. So, if you have a convex function if you might have seen your homework if you have a convex function. So, roughly speaking convex functions look like that right there is a more there is a proper definition in your homework. So, if you take any convex function g of x expectation of g of x is always bigger than or equal to g of expectation of x. And in this case the function simply the square function square function is convex right. So, this is bigger than or equal to that. So, this Jensen's inequality is a very important inequality you will do that in your homework. So, any question so far fairly straight forward stuff right. So, we probably know this already. So, let us look at some examples. So, if you have a Bernoulli random variable x is equal to 1 with probability p and 0 with probability 1 minus p in this case you will have expectation of x square is simply 1 times. So, this is p an expectation of x is also equal to p right. So, variance of x will be p times 1 minus p. So, there is an example in your text book where x is Poisson in your in your notes have this example e power minus lambda lambda power k over factorial k for greater than or equal to 0 right k in this case expected x square. So, expected x first of all is equal to sum over. So, it is simply sum over k equals 0 to infinity k times e power minus lambda lambda power k by k factorial. So, what you will get? So, you will get lambda. So, you what will happen is that the 0 term will be 0 and for every k for k greater than or equal to 1 you will have k and k factorial you will cancel give you k minus 1 factorial you pull a lambda out you will get the get 1 again. So, you will get this will be equal to lambda you can just do the summation. And for this for expected x square expected x square which is sometimes referred to as the second moment expected x square for that will be sum over k square e power minus lambda lambda power k by k factorial k equals 0 to infinity you can do some manipulation now. So, what this boils down to is you you can write k squared as. So, you can do some manipulation here let us let me write it this way k equals 0 term is 0 obviously k equals 1 through infinity. So, you can write it as k I think I have to do something like that if I remember correctly k squared minus k plus k times I think this is what works. So, this is this. So, this guy will give you lambda this part will give you lambda. So, if you take that. So, k square minus k will cancel with that to give you k minus 2 factorial. So, that will be sum over k equals 2 to infinity lambda squared e power minus lambda lambda to the k minus 2 over k minus 2 factorial. So, I am just considering that this bit plus sum over k equals 1 through infinity lambda e power minus. So, this term you will get lambda e power minus lambda lambda power k by well lambda k minus 1 by k minus 1 factorial. So, this is again. So, this will be lambda squared plus lambda. So, variance sigma square x will be lambda squared plus lambda minus lambda squared. So, the variance is also equal to lambda. So, for the Poisson random variable both expected value and variance are equal to lambda. Let us do a couple of continuous random variables as examples. So, if you have x is uniform in a b in this case you can. So, your expected x will be equal to a plus b over 2 you can show that and expected x squared be equal to. So, this will be b cubed minus a cubed over 3 b minus a and that will simplify further actually. This is I think this is equal to a squared plus a b plus b squared over 3 and then you compute sigma x squared as a squared plus a b plus b squared over 3 minus this whole squared which will be a squared plus 2 a b plus b squared over 4. So, that will be 4 a b 6 a b. So, this should be. So, this should finally, come to b minus a squared over 12 if you just rearrange this guys. So, the variance of a uniform random variable is the width of the interval squared over 12. So, in the case of an exponential you can show that. So, we know that expectation of x is 1 over mu we already derived that you can show that expectation of x squared is equal to this also we showed I think in some other time 2 over mu squared. So, variance will be this minus square of that which will be 1 over mu squared. So, in all this you see. So, if your uniform random variable here this spread over a wide interval the variance will be bigger. Similarly, if your exponential is if your parameter mu is. So, it is this the pdf looks like this if plot of x against x it looks like that with value 1 over mu. So, if your mu is very small you have in your mu is very small your this curve is very spread over it is like very flat. So, the variance is big and if mu is large this distribution is very peaky around 0 and your variance is smaller. So, that intuition keeps coming through then finally, the case of the Gaussian. So, remember we defined. So, we had the n mu what is this Gaussian density n mu sigma squared sigma square root of 2 pi e power minus x minus mu squared over 2 sigma squared for. So, this is for all x in R here actually this mu is the expected value and sigma square is the variance. You can actually do the integral it is a dirty integral it is not easy to compute, but you can do it. If you compute integral x f x dx you will get mu and if you compute the variance you will get sigma squared. So, in fact. So, this Gaussian distribution is parameterized by mean mu and variance sigma square which is why we called it sigma square in the first place. So, this looks like. So, I plotted this earlier. So, it is a distribution that is looks like that. So, it is centered at mu. So, symmetric distribution symmetric around mu the expected value and it goes down like e power minus x squared x minus mu whole squared. And it goes down. So, if your sigma squared is if your sigma squared is small then you have the value at x is equal to mu will be quite high because there is a 1 by sigma term here. And the rate if your sigma is small it also goes down very quickly because there is in this e power minus if this is small then this whole exponent would be very big e power minus all that would be very small. So, when sigma is very small you have a very highly peaky distribution which falls down very quickly. And on the other hand if your sigma is very large you will have something that is not very tall and then is very broad spread out Gaussian. Again you see that the variance being small means the distribution is very concentrated around the expected value. So, this integral I will not do it is a pretty there is slightly dirty integral you have to do integration by parts and use some standard integral. So, any questions. So, could it be that could it be that the variance of a random variable is infinite while the mean is the expected value is finite. So, if the expected value is infinite. So, then variance is not even defined. So, we only talk about variance if you have expected x finite right otherwise this formula would not make sense. So, if so is there a case my question is there a case where the expected value is. So, the expected value is finite, but the variance is not variance is always defined it could be plus it could be either some real number or plus infinity right variance is. So, it is a non negative random variable. So, it is defined. So, is there a possibility of having expected value of x is equal to some finite number and variance is equal to infinity can somebody give me an example. So, first you know an example where the mean is plus infinity expected value equal to plus infinity one sided Cauchy right Cauchy itself does not have a mean right expected value is undefined. So, one sided Cauchy is has an infinite expected value. Therefore, the variance is I mean you cannot if you the mean is infinite you can clearly say that variance is also infinite, but I am saying the opposite is the mean finite and variance infinite is it possible Erlang. So, what is the Erlang look like no see the Erlang looks like basically in essence it looks like x power n minus 1 e power minus x right. So, the Erlang has all find all moments all power moments are finite. So, why is. So, think about why is the Cauchy the one sided Cauchy you said has an infinite expected value right why is that Cauchy is goes down as 1 over x square right. So, when you hit it with an x you get some infinite integral right. Now, I am asking you if it is possible to keep the expected value finite and have the variance infinite. So, what would you do you do yeah if you consider a distribution that goes down as 1 over x cube right or even a discrete random variable which is like 1 over k cube for k greater than or equal to 1. You have to normalize appropriately of course, then you will have a situation where the expected value is finite, but expected x square will be infinite right. So, you can therefore, cook up an example with finite expected value, but infinite variance alright any questions this covariance is. So, is a number that measures how 2 variables random variables x and y jointly vary in some sense alright. So, this variance is define only for 1 random variable x right and it captures how much of a variation around the expected value the random variable has. This covariance is defined for 2 random variables x and y and it captures this. So, definition the covariance of random variables x and y is defined as expectation of x minus expectation of x y minus expectation of y. And if you simplify that you get expectation of x y minus expectation of x times expectation of assuming see this definition is assuming that all these are meaningful right we have to first this is make sense only expectation of x y has a meaning. And you should not have a situation where this is infinity and one of these guys is also infinity right you should not I mean all that in all those cases it is not defined right. If this is infinity and that is infinity right you cannot define the covariance. So, this is define subject to everything being well defined the expression itself being well defined. So, this is the covariance. So, this is again. So, this expression is more useful in computing covariance whereas, this expression gives you better intuition on what the covariance is actually doing right. So, let us say that x and y are random variables of course, they are on the same probability space. So, suppose that x and y have the tendency for the time being let us assume that whenever x is bigger than the expected value. Suppose that y has the tendency to be smaller than the expected value right then you will have a term that is negative right. And if x and y tend to be both on the same side of the expectation with a greater probability either both greater or then the expected value or both smaller than the expected value then you will have a positive value for the covariance right. So, in that sense it captures the joint the expected joint variation in some sense correct. So, I tell you that x is if whenever x is takes a large value value bigger than the mean then y is also likely you take a large value right greater than the greater than its mean then you will have a positive covariance right and similarly in the negative case. So, that is covariance any questions definition x and y are said to be uncorrelated if covariance of x y equal to 0 i.e expectation of x y equal to expectation of x times expectation of y. So, if the covariance turns out to be 0. So, this could turn out to be 0 right such a case x and y are said to be uncorrelated it is a new term you are learning. So, x and y are said to be uncorrelated if covariance is 0 or expectation of x y is equal to expectation of x times expectation of y. Theorem this is an important theorem if x and y are independent random variables they are uncorrelated. So, this is an important theorem. So, independent random variables are always uncorrelated. So, if you take two independent random variables x and y expectation of x y will always be equal to expectation of x times expectation of y. Now, so I will prove this theorem it is a very important theorem and it is somewhat non trivial proof. So, the one thing I want to remark before that is that the converse is not true. If two random variables are uncorrelated it does not mean that they are independent. So, independence implies random variables are uncorrelated, but not the other way around. So, which is a stronger which is stronger independence or uncorrelatedness independence is stronger right. So, remark the converse is not true. So, can somebody tell me why the converse is not true how will you prove that the converse is not true. So, how will you prove that two random variables uncorrelated does not imply that they are independent. That is what is given to you right you are given that they are uncorrelated you have to show that it does not imply independence. So, you want to take constant random variables. So, then what both constant random variables. So, if both if two random variables of both constant the sigma algebra they generate are simply omega and phi right. So, are they independent sigma algebra they are right. So, they are also independent that very trivial case. So, two constant random variables are independent right. So, that is not an example same then they would not be if x and y are equal by there you get variance right. So, you have to cook up a it is not entirely trivial right, but there are some easy examples to show that. So, consider. So, there are many examples you can cook up, but consider x uniform in minus 1 to 1. So, uniform random variable in minus 1 to 1. We can show that x and x square are dependent, but uncorrelated. So, this is if you produce one example it is enough right. So, I am just considering a random variable x which is uniform in minus 1 to 1 and then say y equal to x square. So, in that case I mean I think you will be able to very easily show that x and x square or not when you nobody will believe that x and x square are independent right. So, you can actually explicitly show that if you want you can compute the joint distribution of y and x right and you will very trivially you can show that they are dependent. However, they are uncorrelated right because. So, you say y equal to x square covariance of x comma y is equal to expectation of x y minus expectation of x times expectation of y which is equal to expectation of x cubed minus expectation of x times expectation of x square right. You can see expectation of x is 0 right. So, this term is definitely 0 and x being symmetric around minus 1 and 1 x cube will also have a you can prove that x cube also has 0 expect this term is also 0 right. After all this expectation x cube is simply integral of x cubed half d x is not it from minus 1 through 1 this is expectation of x cubed correct that is clearly 0. So, this is also 0 right. So, x and x squared in this case are uncorrelated, but dependent random variables. So, I think I do not have the time to complete this proof or maybe I will just start it. So, I want to well I am going to prove this right I have proven that the converse is not true I have to prove the theorem now right proof of the theorem. So, I have to prove that if x and y are independent they are uncorrelated how do you prove it p d a, but who says x and y has a p d f right expectation of x y is integral x y d p right correct again right. So, this is a theorem you would have proved separately for discrete random variables proved separately for jointly continuous random variables and then blindly accept it is generally true right. Now, you are going to prove a general theorem which is why I said it is a non trivial theorem right. So, if you have density. So, x and y had a joint density you would just evaluate those two things separately and so that it holds right, but x and y they not have a joint density they do not have a p m f right they may not be discrete one of them may be some singular another may be some mixture right even in that case this holds. So, how will you prove that by now you should know right how do you prove anything in integration start with simple functions right. So, suppose x and y are independent simple random variables. So, I have x has a representation sum over a i i of a i does this representation and y has sum over i equals j equals let us say j equals 1 through m b j i of b j in this case can you prove it. So, in this case. So, x y right x y will be what sum over x y will also be simple right this can be sum over i equals 1 through n j equals 1 through m a i b j indicator a i intersection b j correct suppose I mean these are let us say in canonical representations for the sake of to fix ideas. So, this is your so this is your expectation of so this is your x y random variables itself. So, expectation of x y would be so this is a simple function right. So, this will be i equals 1 through n j equals 1 through m a i b j probability of a i intersection b j. So, now what can you say now is the key step see after the all a i c what is a i right a i is the event that x is equal to correct and b j is the event that correct. So, now this a i and b j are independent events why x and y are independent random variables. So, this will product out correct. So, this by independent. So, by looking at this you can say that these two are independent events right. So, this will be equal to a i b j p a i times p b j right. So, now the a sum when the b sum a sum when the b sum come out right. So, you can take the i summation this only has i's and this these two have the j's. So, now it products out easily right and what you have is the expectation of x times expectation of y right. So, this you can show yeah expectation of x times expectation of y. So, we have shown it for simple random variables. So, we will complete the proof next class now what you have to do now approximate x from below using simple functions. Then you would have proved it for non negative random variables and then finally, you have to do x plus x minus y plus y minus then get the result right. So, tomorrow we are meeting at noon there is a change the time table is not it. So, tomorrow there is a exam. So, 12 o clock is the class right here.