 Hi, I'm Zor. Welcome to a new Zor education. I would like to basically solve as a problem, but it's more of a theoretical kind of a problem related to statistics and in particular statistical evaluation of the covariance and correlation between random variables. Now the beginning of this story I actually shared with you in my lecture on theory of probabilities also related to correlation and the beginning is actually statistical which led me to some probabilistic aspects of this. So I was thinking about very simple thing. If two random variables are independent then their covariance and correlation are equal to zero. So if you have two random variables, now their covariance is defined as average of their product minus product of their averages and we know that for random variables which are independent of each other the average, the mathematical expectation of their product is equal to the product of their expectations and that's why it's equal to zero. So for independent random variables, covariance and correlation which is covariance divided by product of their standard deviations is zero. I was thinking about whether reverse is true. I mean, I know reverse is not true. I just wanted to make an example. How can I prove that reverse is not true? Which means if covariance is equal to zero then my random variables might not be independent. So basically I need an example. Example of two random variables which have zero covariance but dependent on each other. Just one example would prove that from the covariance equal to zero it doesn't follow independence. Alright, so I decided to make such an example in a very simple case when I have two random variables each having two separate values with certain probabilities. And I couldn't really make an example. I mean, every time I was trying to do this type I came to either they are independent or one of them is a constant. So I was just thinking about this and converted this into basically a theorem which I have proven and that was part of my probability lecture on random variables. I think it's probability problem number seven in random variables chapter of probabilities. So, okay, with two random variables taking only two values each we cannot actually make an example of random variables which are uncorrelated but still dependent on each other. So I decided to go a little bit further. So I decided to make two variables, one of them takes two values and another takes three values. And then I really came up with this particular example which I'm going to present right now. This is related to basically, okay, the problem is make an example of two random variables which are dependent on each other but their covariance and correlation equal to zero. And that's what I'm about to basically demonstrate right now as part of this lecture actually. And I do suggest you to try to do it yourself first. I'm just giving a hint that in my case I took one variable which takes two values and another takes three values. And then I found which values and which probabilities on one end they're dependent on the other end they have zero covariance. Try to do it yourself. The problem is presented on unizord.com. This is the problem number two in statistical correlation. And I here am about to present my version of this an example which I came up with. So here is my example. I have two random variables. I call it S and T actually. Now the T random variable takes values, so it's A and B for S and C, D and X for variable T. I'll explain why I use the letter X here. So A, B, C, D, X, so this random variable takes these values and random variable T takes these values. Now I made 100 experiments and as a result what I have found was the following. S was equal to A and S was equal to B in 50 cases out of 100. Now among these 50 cases when my S equals to A, T always was equal to D, so all 50 cases were here and this is zero. However if my random variable S took value B, then my values have been distributed among this and zero here. First of all is it possible? Yes. Are these random variable dependent? Well absolutely. Because just think about it. So whenever S is equal to A, T is always equal to D. That's a clear dependency. And whenever it's equal to B, S is equal to B, T does not equal to D and distributes its value. So it's definitely a dependency. Now how can I prove that there is a dependency in a really mathematical sense? Well very easily. You remember that if two events are independent then the probability of simultaneous occurrence is equal to product of their probabilities, right? Or which is exactly the same thing, the conditional probability of one event under condition another occurred is equal to unconditional probability. That's what basically independence is. Well let's just check this. So here is one event for instance, S is equal to A and T is equal to C. Now the probability of this is zero as we see, right? But the probability of this times this, let's just think about it. T is equal to C in 25 cases, T is equal to D in 50 cases and T is equal to X in 25. So out of 100 probability of this of T is equal to C is 25 hundreds which is one quarter and this is 50 hundreds so it's one half. Obviously the product of these does not equal to zero. And in some other cases I have exactly the same situation but actually almost like all the cases. So it's definitely not this. We don't have this which means we don't have independence. Okay so that's number one. Now how about their covariance? Well let's just calculate, very easy. This is what we are going to calculate, right? So let's go with mathematical expectation or average in case we have statistical distribution of these two random variable. So we have one, two, three, four, five, six different values of mutual taking certain values, mutual distribution. So values A times C, the probability is zero, right? Plus, values A times D, probability is 50 over 100. I'll put 100 at the very end. And probability of AX is zero plus. Probability BC is 25, BD is zero and BX is 25. And I have to divide it all by 100 experiments. That's the average of my values which is the best estimate of their mathematical expectation as possible. Which is equal to, let's just think, now this goes out, this goes out, this goes out, they're all zeros. So I have, this is one half, one half AD, this is one quarter BC and one quarter BX. Okay, that's my average of their product. Let me write it down here. So it's expectation of S times T is equal to AD over 2 plus BC over 4 plus BX over 4. Now, what is expectation of random variable S? Well, we know that it takes two values, A and B, each occurred in 50 out of 100 cases. So this is equal to A times 50 over 100 which is 2 plus B over 2, or A plus B over 2. Now, what is expectation of random variable T? Well, it has three different values and these are probabilities, right? I'm summarizing. So that's C divided by 4 plus D divided by 2, 50 over 100 and X divided by 4. And now, if I multiply them, let me see what I have. Well, it's AC divided by 8 plus AD divided by 4 plus AX divided by 8 plus BC divided by 8 plus BG divided by 4. And plus BX divided by 8. Now, here's what I would like to do. ABC and G are some constants and I will find X such that this is equal to this. And if I will be able to do this, my covariance will be equal to 0, which is this minus this, right? Now, I can actually just put an equation. This equals to this and consider ABC and G are constant. I'll just find X. I'll do even better than that. I'll just put concrete values in ABC and G so I don't have to worry about letters. I will use real numbers, which I just came up completely from the blue. And, you know, it's still a linear equation, so I will be able to solve it. So what kind of values of ABC and G I will take? I will take the values which are easy to calculate. So you see these divided by 8, etc. So I will take the following. B is equal to 2, B is equal to 4, C is equal to 8 and D is equal to 60. I just came up with these numbers for one purpose only. So these divisions will not have anything in the denominator. It will just cross out. So I don't need this. And let me see what kind of equation I will have. On the left I will have AG over 8. It's 8, I mean over 2. So it's 4, right? BC over 4. BC over 4, it's 8. And BX over 4. BX over 4, so it's just X. Okay. Now how about this expression? AC over 8 is AC over 8. It's 2. AD over 4. AD over 4, that's 8. AX over 8, it's X divided by 4. BC divided by 8 is 4. BD divided by 4, BD divided by 16. And BX divided by it's X over 2. Something like this, right? And they are equal to each other. And as you see, this is just a plain linear equation which I can solve. So this is what? 12 plus X equals 10 and 20, 30 plus 3 quarters over 4. Something like this, right? Which means X divided by 4 is equal to 18 and X is equal to what? 72 or something like this. Anyway, there is a solution. Now there is a solution which means that with these numbers I have created a random variable which actually two random variables. I'm taking values 2 and 4 with probabilities 50 and 50%. And another taking values 8, 16 and 72. And the result is that they are definitely dependent on each other based on these probabilities which I put our statistical frequency. It doesn't really matter. So covariance is equal to 0 but they are definitely dependent on each other. So the moral of this story is that from covariance equal to 0 does not follow independence of random variables. Okay. There are certain things which I think you should really be very, very careful about when you are using covariance or correlation coefficient to basically make some kind of a judgment about relationship between random variables. If it is close to 0 or almost 0 in statistical calculations, it does not mean that these two variables are independent. They might be dependent as in this particular case. At the same time, if the correlation is significantly non-zero, like closer to 1 for instance, then you probably should suspect a certain level of dependency. But even in that case, dependency is not the same as causation. And maybe there is some kind of an outside reason for these two variables to be dependent on each other because really they are dependent on some other thing which you don't really know about and have no control about. So that's why I would like to be very, very careful as far as recommendations to use these correlation coefficient. And another thing which is also very important, correlation is really a good measure when you have two random variables which are both kind of in a linear or almost linear dependency of each other. So if one goes up and another goes up approximately by the same, let's say, percentage or something like this, if that is not the case, if you have one is straight line and another is some kind of a curve, then you should not really depend on correlation and you should not really make really important judgment based on whatever the correlation number you are investigating and you are getting actually as a result of statistical calculations. So basically that's it. Just the word of caution whenever you are using this. Now unfortunately in many scientific researches this word of caution was not really listened well by the authors of these researches and they make all kinds of published results which are based on some correlation coefficient which are really non-reliable. And that includes actually medical researches as well. It's very difficult to make a really reliable statistical results. So that's why people sometimes make shortcuts, etc. Alright, so I cautioned you. I do recommend you to read again the notes for this particular lecture. Other than that, that's it and good luck.