 Hi, I'm Zor. Welcome to Unisor Education. I would like to talk about statistical approach to correlation between different random variables. Well, this is part of the Unisor Advanced Course of Mathematics. It's presented on this website. I do suggest you to watch this lecture from the website because it has notes, very detailed notes, which can be actually considered as an independent textbook. So you have a textbook and a lecture because it's kind of like a real school or university or whatever. Anyway, we will talk about statistical correlation today. Well, first of all, we have to realize that everything related to statistics is based on theory of probabilities. Now, in this particular case, there are a few lectures in theory of probabilities part of this course dedicated to correlation coefficient, covariance and then correlation were introduced. I do suggest you to refresh your memory and go through these couple of lectures. And there are some nice examples, actually, which will basically give you the theoretical foundation for statistics, which I will be talking about today. Alright, so being as it may, let me just remind you very briefly the definition of covariance and correlation from theory of probabilities standpoint. So let's consider you have two random variables. Now, first of all, I have introduced the covariance between them, which is expectation of their product minus product of their expectation. Now, immediately, you see that if these two variables are independent from each other, then the expectation of the product is equal to product of their expectations and the whole covariance will be equal to zero. So in a way, covariance is kind of a measure of independence, if you wish. So if the variables are independent, then covariance is equal to zero. So the reverse is not actually always true, it might or might not be. So if covariance is equal to zero, it does not really mean that they are necessarily independent, but it might be or might not. Now, the second thing which I could introduce in theory of probabilities course was correlation coefficient between them, which is defined as covariance divided by square root of their variances, product of their variances. Now, what's good about this? Why did we scale down or up or whatever the covariance? For one very, very simple reason. Now, the correlation coefficient is still zero if these guys are independent of each other, because the numerator will be equal to zero. Now, for ultimately dependent on each other random variables, now what do I consider to be ultimately dependent? Well, this, if a is some constant, they are proportional to each other or equal to each other, one of the particular case when a is equal to one. So that's the ultimate dependency. One value completely determines another, no some accidents, no randomness or anything else. So this variable completely defines this one. What happens in this case? Well, in this case, I can state that r is equal to r between two of them, correlation coefficient is equal to plus or minus one, depending on the sign of this constant. So this constant is positive, which means that this random variable is increasing when this is increasing and decreasing when this is decreasing. The correlation is one and if the a is negative, which means they're going in the opposite direction, one up and another down or vice versa, then the coefficient will be minus one. And it can be proven that for any random pair of random variables, this particular correlation coefficient will always be between minus one and one. With minus one and one on both ends representing ultimate dependency and zero representing independence. Okay, so that's all about theoretical foundation of correlation. Now let's go back to statistical correlation. Now, what is statistical correlation? Well, if I do not have complete distribution and I cannot calculate all these things, instead I have, excuse me, sample values. So I have certain observations where I observed c and eta. Based on these observations, I can actually estimate, I can evaluate this particular coefficient, this correlation coefficient and then make a judgment based on the value which I have obtained. How these two random variables are dependent on each other or not. So let's talk about sampling. Now, if I want to calculate something like this or like this, now obviously, if I have a sample of c, I can calculate this, I can calculate, if I have a sample of eta, I can calculate that and I can calculate separately their mathematical expectations. How about this one? Well, this is a completely different story because this means that they are together arranged into some kind of a calculation. In this case, it's a multiplication. Now, what does it mean? It means that I have to know the mutual distribution of these two variables, which means what's the probability of this taking one value and this taking some other value. If I have, at the same time, so if I have this mutual distribution, then I can actually calculate this expectation of their product. Now, since we are talking about statistics, nobody gives us any distribution beforehand. All we need is just sampling. But that's what's very important in this case. For instance, c is observed and it takes value. In any experiments, it took these values. Now, eta took these values. What's important is to synchronize these two experiments, which means whenever c took value x1 at the same time or at the same condition or under the same circumstances, eta took value y1. So, we need combined observation. That's what's very important. So, we observe and we see that c took x1 and eta took y1. Then we observe another time and then it will be x2 and y2. And then we observe the nth time, we have xn and yn. So, the observation should be mutual to be able to calculate this. This is very important. I cannot measure separately, have n experiments with c and get n values and then completely unrelated to this event take some experiments with eta and take these values. These are not related to each other, so we cannot really combine them into mutual distribution, which is required for this. Only if we can evaluate the probability, the frequency, whatever it is on their simultaneous taking certain values. So, that's why these n experiments should be experiments with both time or under the same circumstances. Now, what can it be? Just for example, it can be, let's say in the financial market, it can be, for instance, the price, closing price of some stock and at the same time you have something like index like Nasdaq 100 index observed at exactly the same moment at the price closing at the end of the day, let's say. So, you have, for instance, International Business Machine Corporation, IBM and you have Nasdaq 100. So, the day has closed and at the closing bell you have the value of one and you have the value of another. Then you have the next day, do exactly the same thing and the next and the next. So, if you observe n times in a row and each time you have both values at the same time, then you can actually talk about some statistical calculations, et cetera. Right now, if we have exactly this type of a situation, so when we can relate x1 to y1, x2 to y2 and xn to x to yn, then we can talk about calculation of their average of their product, mean product. Okay, here it is. So, we have xc taking the value xk and eta taking the value yk and I intentionally use the same index k in both cases because it's supposed to be, observations supposed to give both values, value of xc and value of eta at the same time or under the same circumstances. Now, if this is given to us and we have n observations, then we can definitely start calculating. What's the mean value, statistical mean value of c? I have to have, I have to summarize all the values from one to n and divide it by n, right? Now, similarly, I can calculate this divided by n. Similarly, I can calculate variance of xc, which is, it's a average of the square of deviation from the mean, right? So, deviation from the mean is xk minus e of xc. Square and we have to divide it by number of experiments and usually we put n minus one, I hope you remember that, to have unbiased variation. Well, with large n it doesn't really matter, but with a smaller n there is some noticeable difference. So, this is a variation of the variance based on statistical characteristics we have received based on sample where e of xc is this one. And similarly, obviously we have 4 eta, we have deviation from eta square, summarized and divided by n minus one. Great. So, all which is remaining is this one. What is the mean value of their product? Well, that's very easy. Since we know that c times eta took values xk times yk at each experiment, we have n experiments, the value of this new random variable will be observed as xk, yk on the case experiment. So, I have to add them together and divide it by n and that's my mean value. So, and this is c times eta. So, we have evaluated each component of this. So, we have evaluated this, then we can multiply these two together and then we can divide it by this. Now, the correlation coefficient is actually calculated. Okay, fine. So, we've done that. Now, we have to make certain... I mean, why do we do it? But we did it just because we would like to make some kind of a judgment on whether these two random variables are or are not related or dependent on each other, etc. Well, obviously since we know that our r is from minus one to one with value on the edges of this interval basically representing almost like linear dependency and value in the middle, which is zero, almost represent in dependency. So, therefore, based on this particular value which we have obtained using these calculations, we can make certain judgment just basically positioning our correlation coefficient on this interval from minus one to one. Now, obviously if it's very, very close to zero then you can say that, you know, what probably there is no dependency. If it's very, very close to one or to minus one you can say that probably dependency is relatively strong. So, what are the guidelines? So, whatever I'm saying right now is rather subjective. And it definitely depends on concrete circumstances. But traditionally and generally speaking, etc., whatever you want to say about this if the absolute value of r is less than 0.1 so it means from minus 0.1 to plus 0.1 we probably should consider that there is no correlation in this case. If it's from 0.1 to, let's say, 0.4 I would say there is a weak correlation. If it's from 0.4 to 0.7 moderate correlation and if it's above 0.7 then it's probably strong correlation. And again, the closer this is absolute value to 1 the stronger the correlation. And the closer it is to 0, the weaker is correlation. Now, these are no more than some kind of a guidelines and probably rather subjective, my own personal opinion would be and it's definitely not like universally acceptable kind of classification. So, most likely in different practical cases you will have different opinions about this. So, that's how we classify the correlation. Now, let's consider that we have established that there is some kind of a dependency let's say strong dependency, strong correlation between various of two different random variables. The next question is, can I say that one random variable influenced another or caused another to be such and such, to take such and such value? So, the cause and effect, causality, as we are saying, how is it related to correlation? Well, that's not such an easy question. Generally speaking, correlation doesn't mean causality. Because, well, here is what? Let's consider you have two random variables, c and eta and their coefficient of correlation is 0.9 which signifies a rather strong correlation. Now, is it c causing eta to take value which is like in sync with c or is it eta which is causing c to take this particular value? Or maybe there is some kind of a new random variable which we don't even know about and it caused both c and eta to take relatively synchronous value. We don't know that. I mean, it's not in the mathematics, let's put it this way. We might know about this from some other subjects like chemistry, physics, etc. geology, medicine, whatever you want, but it's not really in the mathematics. So, whenever you have calculated the correlation between two different random variables based on their statistical behavior, it's not really up to you basically to say that one is the cause for another without any kind of additional consideration. But here is some consideration which might actually be important. I was talking that both c and eta are supposed to be observed under the same circumstances or at the same time. Well, what if it's the same circumstance? For instance, we are experimenting with some drug and effect of that drug on the same person. In this case, the same person is something which basically combines, unifies these two things, the drug and the effect of this drug. So, if we always have this kind of a dependency between how much drug we get and how much effect we have from this drug, then probably there is a cause which is drug and effect. But it's not in the mathematics, it's completely from the different areas from the medicine. Similarly, in financial industry, for instance, you are measuring how much money the Federal Reserve Bank pumped into the market at, let's say, 2 o'clock in the afternoon. And what is the closing price of the market, some index at 4 o'clock in the afternoon? Well, there is actually some kind of a dependency here and obviously something which happens first might be the cause for something which happens second. Definitely not the other way around. So, at least you might have some judgment about cause and effect in some cases. But again, let me repeat that generally speaking, the mathematics alone, just the value of the correlation doesn't really give you any other information, but just put this number and there is some dependency, strong or weak or moderate or no dependency at all. It does not provide information about causality. So, correlation and causality are different things and causality can be derived from outside of the mathematical areas. And that's basically the end of this lecture. I will probably have some examples of correlation between different random variables and some future lectures. Thanks very much and good luck.