 Hi, I'm Zor. Welcome to a new Zor education. Today I would like to talk about dependence between random variables and actually about how to measure the degree of that dependence. So we'll know about independent variables and now we'll talk about some measurement of their dependence if it exists. Basically, the purpose is more for statistical reasons, but I will address that in the statistics course. Right now I'm talking only about dependence and the measurement of this dependence from theory of probabilities standpoint. This lecture is part of the advanced math course for teenagers and high school students. I present this course on unizor.com. I suggest you to watch this lecture from this website because it has notes for each lecture and for registered students with the whole educational process which can be engaged upon using enrollment and exams. Alright, so back to dependence and independence. Alright, so we are talking about independence or independence between random variables, so let's just introduce a couple of random variables. So how do we do it? Well, first of all to introduce any kind of a random variable we have to provide, if you remember, the probability space which contains certain elementary events and the probability measures associated with each elementary event. And now we need the second probability space which has other elementary events and the corresponding probability measurements associated with each event. Now for each elementary event there is a function which basically takes some numerical value. In this case, I call it C, in this case, I call it A. And it takes certain values on each elementary event. So the value of this function on element EI is equal to XI and the probability of this is equal to PI. Similarly random variable eta on elementary event Fj takes value Yj and the probability of this is equal to Qj. This is letter Y, not Greek letter Phi. Okay, so let's forget about probability spaces. We don't really need them anymore just for a definition. So we have two random variables C and eta. This one takes M different, or not different, doesn't really matter element or values which I called X1, etc. XM and the probability of each is equal to PI and random variable eta takes values Yj where j is from 1 to n and the corresponding probability Q. Okay, so fine. We have our random variables. We don't need any more our probability spaces. They don't really participate in any further explanation. It's just to make you a little bit more aware about back to what is random variable, right? Now, first of all, let's concentrate on independence. What are random variables are called independent in this particular case? So these we have two random variables. When can I say that they are independent? Well, here's the definition. If conditional probability of C to take some value XI under condition that eta already took some value Yj. If it's equal to unconditional probability of C to be XI, for any index I from 1 to M and index J from 1 to N. If this is true, so conditional probability under condition that eta took some value is equal to unconditional probability. Well, that actually means that there is no dependency between them. So it doesn't matter what eta took, what kind of a value eta took. Probability of C to take any particular value is exactly the same. So that's independence. Well, that's great. And let me just remind you a couple of properties of independence which will be used in this lecture in one or another form. Well, first of all, we have the property which I would call symmetry. If C is independent of eta, and this is the definition, then eta is independent of C. It can be proven. It was proven during the corresponding lecture when I was talking about independence. So basically it means that the probability of eta to take value Yj under condition that C has already taken some value XI is exactly the same as unconditional probability of eta to take value Yj. So this follows from this. Now, another thing which is also following is that probability of simultaneously taking certain values and eta to take value Yj is equal to the product of these probabilities. Multiply by probability of eta to take Yj. This has already been, again, proven before. I'm not going to prove it right now. But I just want to mention the property of independent variables. This is only for independent variables, obviously. The probability of simultaneously taking some values is equal to the product of their probabilities. Like, for instance, if you have two dice, the probability to have, let's say, number two on the first and number five on the second is the product of probability of the first one to take number two and the second one to take the number five. Because the first one, the probability of simultaneously having two and five is basically one-thirty-six, because there are all thirty-six combinations. The probability of each is one-six. So one-six times one-six is one-thirty-six, right? Okay. And the third one, which I also have proven this, it has been proven in one of the previous lectures, that mathematical expectation of two, of the product of two independent variables, very important, independent variables, is equal to the product of their mathematical expectation. By the way, the corresponding theorem about the sum of two random variables does not depend on whether they are dependent or independent, but the product, the mathematical expectation of the product equals to the product of expectations only for independent. The theorem was proven only for independent. Well, it doesn't mean that it's impossible to have dependent variables with the same property, but this is kind of an, you know, it happens, but it's not really something which is true always. With independent, it always true. And obviously, if this is not true, it means they're not independent, right? All right. So these are properties which we will be using in further research in this lecture. And I would like you just to point, I would like to point out these properties. They exist and we will be using them. Okay, next. Next, I would like to talk about dependent variables. And what's very important is that the last property which I was just talking about, that the mathematical expectation of the product of two random variables equals to the product of mathematical expectations, this is actually the base for measuring the dependency between them. Because if there is a difference between expectation of the product and product of expectations, then this difference actually signifies that there is certain degree of dependency and the greater this difference is, the greater dependency we assume supposed to be between these random variables. So to make it a little bit more concrete, I would introduce something which is called covariance. Covariance between two random variables is equal to it's a multiplication of their deviations from their average values averaged. So we take, first of all, we centralize our random variables. So the mathematical expectation of the centralized variables is zero, right? Mathematical expectation of Xe minus mathematical expectation of Xe as a new random variable has expectation zero, obviously, right? Because this is the constant. Now, we multiply them and I would like to take mathematical expectation of the new random variable which I have formed. And let me explain you why this is a good measure of dependency. Let's just open all the parentheses here. What happens here? We will have E, mathematical expectation of Xe times eta minus Xe times expectation of eta, which is a constant, by the way, right? Minus eta expectation of Xe and plus expectation of Xe times expectation of eta. Equals. Now, these are random variables. This is the product of our random variables. This is the product of our random variables and the constant, which happens to be expectation of another one. Similarly, this and this is just a constant. All right, so mathematical expectation of a sum always equals to sum of mathematical expectations, regardless of dependency. We remember that. So it would be E Xe times eta minus mathematical expectation of this. Since this is a constant, I can take it out from the expectation, right? Remember, expectation of Xe times constant is equal to constant times expectation of Xe, right? That's the property of mathematical expectations. So E of eta goes out and remains E of Xe. Minus here. Same thing here. E of Xe goes out and remains E of eta and plus this mathematical expectation of a constant is this constant itself. So what do we have here? That's what we have. Only these two numbers remain. I just flipped them. Now, what is this? Well, remember for independent random variables, this is equal to this, which means the whole thing is equal to zero. So this is basically a very significant fact. It tells us that covariance between two independent variables is zero, which is a good sign, that this is a good measure. Well, let's talk about this particular thing and apply it to other examples of dependent, in this case, variables. So let me just rewrite this thing here. It looks simpler than the original definition, but they are absolutely equivalent. Whatever I just wrote before as a definition and whatever I have derived as this property is completely equivalent. Okay, so we will use this form of the definition, which is exactly equivalent to the original one. And let's examine how this particular definition of the covariance acts in certain dependent cases. Okay. What is ultimate dependency between two different random variables? Well, that's ultimate dependency. If random variable eta is exactly the same as C, which means it's defined on exactly the same probability space, it takes exactly the same values for each elementary event with exactly the same probability. So they are obviously dependent. As soon as C takes some value x i, eta takes exactly the same value x i. And the probability of this event is exactly the same. So they are ultimately dependent on each other. So let's see how this particular covariance would work in this particular case. So covariance of C and C. What is it? This is C square minus which is what? That's variance of C. Right? Remember, variance of C is average square of deviation. Right? Remember, variance of C is equal to average of square deviation of C from its from its expectation, which is equal to its square minus 2 C E of C plus E square of C. Right? And which is equal to E of C square minus 2. And this is the constant. So it goes outside of the and times this one plus E square of C. Now this is minus 2 E square and this is plus. So it would be E of C square minus E square of C, which is exactly this. So this is variance. Okay, so we have actually established that the variance is the covariance of any random variable with itself. Okay. What is a little bit different example of very, very rigid correlation between C and eta? What if eta is equal to some constant multiplied by value of C? It's exactly the same type of rigid relationship between C and eta as this one, because the value of C 100% defines the value of eta. You just multiply it by 8. That's the only difference with this one. So let's see what covariance will be in this case. Covariance C and A C equals 2. We will use this. So I will have E of A C square, right? Minus A of C and again, I mean E of C and and A times E of C. So minus A E square of C. Right? Which is equal to what? This is the constant goes out and I have this. And this is again variance, right? So in this case, I have A times variance. Now, this is very interesting because I can actually have A positive or negative. If A is positive, let's say it's equal to 1, all right? I will have exactly the same as this one. If A is negative, let's say minus 1, I will have minus variance. Now what minus means in case of covariance? Well, if this is minus 1, then whenever C goes up, A goes down, right? So the covariance also has absolute value of the variance of C, but the sign will be plus 1 or minus sign depending on plus or minus sign we have here. So whenever we go together C and A, the covariance is equal, the covariance is equal to variance. Whenever they go in opposite direction, covariance is equal to minus variance, so which seems to be reasonable, right? When one goes up another goes down it's reasonable to expect that the measure of their dependency is negative, right? So they go to different direction, but very synchronously and that means that their covariance is negative. It's a natural thing. So what do we have right now? We have the covariance of two independent variables is equal to 0, which is good. Covariance with a random variable with itself is equal to some positive constant and covariance with minus itself is negative constant which has exactly the same absolute value. So that's reasonable. Now would you expect if our relationship is not that rigid? I mean there is a relationship, but not as rigid as in these two cases. Then the covariance should be, well, slightly less than, let's say this one, right? But it should be positive if we go up synchronously and it should be negative if we go down synchronously, right? That seems to be reasonable. So let's just take one particular example. I have, I called half dependency and here is what it is. Example of covariance between Xe and let's assume we have two different variables Xe and Xe prime which are independent, that's very important, independent and identically distributed. Now what I will do is I would like to establish the covariance between Xe and this variable which is partially Xe, half of this variable if you wish is Xe and half of this is borrowed from a completely independent variable which is distributed identically to Xe. Well, covariance between Xe and Xe prime obviously is zero as we know because they are independent as I said. But what is the covariance between Xe and this variable which took half of the value from Xe itself? So there is a dependency and half of the value from independent of from Xe variable. Let's examine in this particular case. So covariance between Xe and Xe plus Xe prime divided by 2. It's equal to, so first of all we have their multiplication which is Xe square plus Xe Xe prime over 2, right? That's their multiplication. Minus, minus product of their expectations. Now, what is this? One half goes out. Then we have a sum of two different random variables. Expectation of the sum is sum of expectation. So I will have E of Xe square plus E of this product. Now the product is a product of two independent variables because that's our definition in the very beginning with Xe and Xe prime as independent variables and the expectation of the product of two independent variables equals to product of their expectations. So that's why I can write it this way minus. Again, one half goes out. So I can actually retain this and I will continue in my square brackets. Now after one half goes out I have E of Xe times E of Xe because this is the sum of expectation of sum is sum of expectation. So minus E of Xe square right and minus product of expectations of Xe times expectation of Xe prime. Now let's think about it. What is expectation of Xe prime? Well, I said that Xe and Xe prime are independent and identically distributed. If they are identically distributed, their expectation is exactly the same. So this is exactly the same as E square of Xe and this is minus E square of Xe and this is the gain minus. So basically I have 1, 2, 3 exactly equivalent terms, but one is with a plus and two others with a minus. So I can actually reduce this thing and what do I have remaining? I have this covariance is equal to one half and what's in the parentheses? In the square brackets. This is square bracket. So what's in the square bracket? Expectation of a square minus square of expectation, which is a gain variance. So now look at this interesting phenomenon. My variables Xe and Xe and Xe plus Xe prime over 2 are, as I said, half dependent, right? Because this variable took half of its value from the Xe and half from some other independent variables. So it's less dependency than let's say in this case when Eta took all the values from Xe and they are absolutely in sync. So my covariance is reduced by half. Isn't that wonderful? Half of the dependency results in half of the covariance. So that's all makes sense. Well, obviously if I will take something like minus or plus, if I will have minus Xe plus Xe prime, it would be minus here. So again, it will be if it's in a different direction, it will be again half of this, but with a minus sign. So all I'm all I'm saying is that this coefficient of covariance has has a very meaningful actually applications. It's really measuring how well or how much one random variable is dependent on another. One more step and we will finish this. The step is, you see, this is dependent on the variance. I think it's very interesting to have a measure which is concentrated, let's say, from minus one to one with zero being the measure of independent variables. Now one in cases like this and minus one in case if I put minus here and minus here. So it would be interesting to have this measure and if it's a half of something, so it would be somewhere between minus one and one, depending on direction. If one goes up, then another goes relatively up, more or less, and one and for positive. And if one goes up and another goes down, that would be negative. So we are looking for a measure like this. How can I actually change my definition of this dependency from covariance to something else, which would be better to correspond my desire to have the measure between minus one and one as the dependency, with one and minus one being the maximum dependency, positive or negative, and zero being basically a measure of independent variables. Well, it's very simple actually. We introduce new we introduce new coefficient, we will call it coefficient at correlation, which is equal to covariance of divided by square root of multiplication of their variances. So what I'm saying is that this measure is there is a term normalized. We are kind of reduced this using the scaling factor. This is a scaling factor. So my coefficient of correlation would not be dependent on variances, but would be dependent only on really dependency. So it's a good measure of dependency, which is completely not unrelated to how far our variables are spread relative to some middle value, some average value. Now let me just exemplify this in all these cases, which I have just considered before. Case number one, when they are independent, what will be with their correlation? Well, if they are independent, the covariance would be equal to zero. So that's why the whole thing is zero. So we have achieved our point. Independent variables have zero correlation. That's good. Now let's talk about dependent variable. Let's examine that's eta. This is a very rigid dependency with some coefficient a. Now what happens in this case? Well, we know that covariance is equal to a variance c. Right? Now the bottom would be square root of variance c times variance a c. Now what is the variance of a times c? Well, you remember that whenever you have a variance of the constant multiplied by random variable, you can actually take it out but in square because the variance is actually an average of a square deviation. So whenever you're squaring that would be equal to a variance of c divided by a square variance of c square. Right? Variance of c and another variance of c. That's variance of c square and a goes out in a square. Now, what is this? Well, variance is positive, right? So I can always say that the square root of variance square is equal to variance. What is the square root of a square? Well, if anybody tells me it's a that's wrong because it's absolute value of a. This is a arithmetic square root, which means it's always positive. Now a can be positive or negative, right? So that's why we have to put the absolute value. So what's the result? For real random variables with variance not equal to zero, we can cancel this one and what's remaining a divided by absolute value of a. And I would like to talk about this for a couple of seconds. So covariance divided by the correlation of c and a of c is equal to a divided by absolute value of a. Now, what is this? What is absolute value of a? Well, considering k is not equal to zero, obviously. Now, what is absolute value of a? For positive a it's exactly the same as a, right? So I will have a divided by a for a greater than zero, which is one. For negative a, if a is negative, absolute value is positive, right? So we will have minus, if you wish, a divided by a which is equal to minus one. This is for a negative. This is a positive. So for this type of dependency my correlation coefficient is either one or minus one, depending on the sign of a, which is absolutely reasonable. If a is positive, they go up together, up or down together, doesn't really matter. If a is negative, they always go to the opposite direction, and that's why the coefficient of correlation is minus one. So it's a good measure in this very, very rigid dependency. And my last example, when we will have half dependencies, as I call it, that's a little bit more interesting. So what would you expect the correlation would be? Well, if you think it's one half, that's incorrect, but very close actually. So correlation between c and c plus c prime divided by two, where c prime is independent of c and identically distributed. So that's very simple. On the top, I have one half of variance. Remember? Of c, we just did it. On the bottom I have square root of variance of c Now what is variance of this guy? Well, you remember if you have a coefficient, some kind of a factor, you can take it out from the variance, but in a square, right? Now then you have variance of a sum of two independent variables. Variables, if they are independent, then the sum of their variance is equal to variance of their sum is equal to sum of their variance. Always important is that they are independent. This theorem is only for independent variables. It's like multiplication. Whenever you have expectation of product is equal to product of expectation only for independent. Same thing with variance. Variance of sum is equal to sum of variances only for independent variables. So if I have some of these two variables and I have variance of their sum, so it's sum of their variances and they are identically distributed. So each is exactly the same variance and is equal to variance of c. So I have two variances of c. Now obviously variance is positive, so this would reduce. So I will have one half divided by square root of one half which is equal to one goes here, which is equal to square root of two over two. So again, it's a good measure in this case. It's a positive. Now if we had some negative for instance would be negative, but in any case it's something between now square root of two is one point four something, so it's zero point seven. So it's not exactly one half. It's more than one half, but anyway it's a meaningful measure of the dependency between this and this. So my purpose was to introduce this particular this particular coefficient. It's called coefficient of correlation. Covariance by the way is just a way to introduce the correlation. So I was talking about correlation. That's my main goal. I have introduced this as a measure of dependency between random variables. And as a side note, obviously this is not very much used in, well, series probabilities per se. This is just an apparatus which is used in statistics because in statistics we can have something which is called a sample correlation. And the sample correlation when we don't really know whether two different random variables are or are not related to each other are or are not dependent on each other. And to check whether there is such a dependency or not, we will use the correlation between them, which we can statistically evaluate. And if we are very close to zero, whenever we are calculating the calculation, the correlation, then the chances are these variables are relatively independent. If there is a strong correlation which is close to one or close to minus one, for instance, we need this correlation between, let's say, a drug, which we are applying to treat some illness and the final result, whether the person gets better or not. So that's the most important factor. The correlation between application of drugs and whether the person gets better. If there is such a correlation, if it's a strong correlation closer to one, that means the drug is working. Okay, that's it for today. Thank you very much and good luck.