 Hi, I'm Zor. Welcome to Unizor Education. I'd like to solve a couple of statistical problems related to correlation. Very simple problems. You really should try to do it yourself. The description of the problems is everything else on Unizor.com. All you need is just a plain calculator. And I will just do it myself right now. I did have pre-calculated a few numbers. I will borrow it from here. But you should really do it yourself first. It's really very simple. So this is just like a guidance for future correlation problems which you might actually consider solving. All right. So I have two problems. Both are very easy. Okay, let's consider you have two random variables which you would like to compare whether there is some dependency or not. Primarily, one variable is the salt concentration in the water. And another is the temperature of the boiling. So let's say you suspect that the temperature is somehow dependent on the amount of salt. And, by the way, it is. I mean, the salty water usually boils in a higher temperature than non-salty water than plain flat water. All right. So I have a truly artificial example of the numbers which I have obtained. I put it in a table. Let's say in some units you put one or two or four units of salt into the water. And you know the results of the boiling water. Forty times you had a temperature of one or one Celsius of the boiling water. If you put one unit of salt and no other temperatures occur. If you put two units of salt, the temperature would be 35 times measured and no other temperature observed. So for two units of salt, you observed 35 times temperature 102. And for four units of salt, you observed 104 in 25 times and no other temperature. So as you see, this is a very, very simple case when this random variable, which is the temperature of boiling, is very strictly dependent on the amount of salt. Actually, you obviously can even write the formula. Okay. But you don't know whether it's right or wrong. You would like to investigate, based on this information only, you would like to investigate whether there is a correlation between these two variables and you need the coefficient of correlation. So how can I calculate this? Well, let's go back to our course of probability theory. And let's recall that the correlation coefficient between two variables, c and eta, is equal to covariance divided by square root of product of their variances. Where covariance is equal to expectation of their product minus product of their expectation. I would like you to be comfortable with these formulas. And if you don't remember exactly, go back to the probabilities lecture and that's where they're all derived properly. So all we need to do is to calculate these things. So what do I need? For covariance, I need mathematical expectation of their product and mathematical expectation of each one of them. So let's just see. Mathematical expectation of s, which is amount of salt. So 40 times, I saw the number one. I mean the amount of salt, which is a random variable, took the value one. 35 times it took value two and 25 times it took value four. So what's the mathematical expectation considering the sum, the total number of experiments equal to 40 plus 35 plus 25, which is 100. So this expectation will be equal to 1 times 4, 40. I mean it's 40 times this plus 2 times 35 plus 4 times 25 divided by 100. That's expectation, right? So if I have values one, two and four with these frequencies, that's my expectation. That's my expected value. That's the mean. And it is equal to 2.1. Now, exactly in the same fashion, I know that my temperature of boiling water took 40 times value 101, 35 times value 102 and 25 times value 104. Again, 100 experiments. So my expectation of temperature would be equal to 101 times 40 plus 102 times 35 plus 104 times 25 divided by 100 experiments, which is equal to 102.1. Now, third is the product. Now the product is 1 times 101 times 40, right? 1 times 101 times 40. Now, the product of 1 by 102 has zero and product of 1 by 104 also has zero. So I'll skip them. Product of 2, again, the only non-zero is 102. So it should be plus 2 times 102 times 35. And finally, 4 times 104 times 25 times, again, 100. And it's equal to 2015.8. So now we can calculate the covariance between s and t. It's this minus product of this. And it's equal to 1.39. OK, so that's my covariance. Now let's talk about variances. Now, variance of s, how can I calculate? Well, you remember that variance is an average of square deviation from the mean value. Mean value is 2.1. So I need a square deviation and then I have to average it, right? So it's equal to, so it's 1 minus 2.1 square. And this happens to be 40 times, right? Plus 2 minus 2.1 square times 35 plus 4 minus 2.1 square times 25. And the answer is, well, would you believe it's 1.39? Yeah, divided by 100. Well, actually, to be exact, I have to divide by 99. If you remember, whenever you're doing statistical variation, you really should put n minus 1 where n is the number of experiment for your evaluation to be unbiased. However, in this particular case, I have decided to divide by 100. Number 1, because the difference is not so big. And number 2, because I have a nice round number, which is exactly equal to this. And the purpose of this, you will see in one second. Now, let's talk about variance of t. So variance of s is equal to 1.39. Variance of t is equal to, now, what is it? Well, that's actually, I mean, you should really guess that should be exactly the same because all these values are exactly 100 greater and the expectation is 100 greater than in case of s. Which means that the deviation is exactly the same and variance as well. So it would be 1.39. And now, if you will substitute this into this main formula, this is 1.39 and this is square root of 1.39 times 1.39. So as a result, you will have equals to 1. Which obviously can be predicted because of this. They are linearly dependent of each other and we were talking about whenever you have two random variables, which are linearly dependent on each other, their correlation coefficient is 1 or minus 1 depending on this coefficient s in this case. If it's positive, it's 1. If it's negative, it's minus 1. It's a rigid dependency of one from another and this is reflected in correlation coefficient to be equal to 1. So that's the end of the first problem. As you see, this is a problem of really rigid exact connection between these two random variables and as a result we have correlation equal to 1. Now let's talk about different case. We will talk about a case when we are actually putting more salt. We will put 11, 12 and 14 units of salt. And here are the observations. Well, we put 11 units of salt and in 40, I use exactly the same numbers just for easy, and in 40 cases I have my temperature exactly 111 degrees Celsius and no other. Then if I put 12 units of salt in 35 cases, I have one and only temperature which is 112. And when I put 14 units of salt, I have this in 25 cases, I have the temperature exactly the same as before. Why does it happen? Well, you know that there is a concept of saturation. You can put so much salt into the water until it's being dissolved and then it just stops dissolving. The water doesn't take any more salt. So let's consider that this was exactly the saturation point, the 12 units of salt. So 14, even if you added salt, the salt did not really dissolve in the water and therefore the concentration of salt in the water remain the same and the boiling point is exactly the same as this one, which is the maximum you can put, the maximum you can reach if you make the water saltier and saltier because it doesn't take more salt, right? So how about this case? And let's just go through calculations again exactly the same calculations that we did before. So what do we have in this case? And I'll just give you exact numbers rather than explaining how I derived these numbers because I derived exactly the same way as in the first case. So the expectation of S is exactly equals to 12.1, which is kind of natural. I just added 10 to my each amount of salt. It used to be, if you remember in the previous problem, 2.1. Now it's 12.1. That's fine. Now the average of temperature would be different and it will be 111.6. And I calculated it exactly the same as before. 111 times 40 plus 112 times 35 plus 112 times 25 divided by 100, obviously. And my difference here as well is also times S times T, 1350.8, also different than in the previous case. And my covariance, which used to be 1.39 in the previous problem, is equal to 1.44, a completely different number, obviously, because these numbers are different. Now speaking about variances, now variance of S is similar to the one before, which is 1.39. And it should be, because all I did, I changed by 10 every component and obviously by 10 has changed my average. So the deviation from the average is exactly the same as before. That's why I have 1.39. But the variance of T is not 1.39 as before. Instead it is 0.24. Why is this much smaller? Well, because I have, instead of 1.14, which I would expect, I have 1.12. So it's closer to their average, 1.11.6, right? So the deviation was greater in the previous case, because I had 101, 102 and 104. Here I have much closer range of variance, 1.11 and 1.12, basically, right? So if I have a closer range, I have a smaller deviation. And now all I have to do is substitute this into this formula. So it will be 1.44 divided by square root of 1.39 times 0.24, which is equal to 0.76. So as you see in this particular case, I still have some correlation not equal to 0, because you see this is growing and this is growing. But obviously it's not the correlation of 1 as before, because there is no linear dependency anymore. Now if you put everything on a graph, 1, 2, 3, 4, 101, 102, 103, 104. Before, in the first case, I had this, this and this. And they were lying on a straight line. In this particular case, when I have 11, 12 and 14, I have 1.11, 1.12. I have 11, I have this, 12 I have this and 14 I have exactly the same. So this is not a straight line, this is some kind of... So basically I think in the real physics of this process, the temperature is really increasing with the number of salt, and the real graph is something like this. And it asymptotically goes to some kind of a maximum. Well, in our case, it's three points which kind of resemble this asymptotically approaching some maximum curve. So now we have a different coefficient of correlation, which reflects basically the dependency between these type of things. Well, that's all. I mean, it's couple of small problems. Again, if you didn't do it before, I recommend you to do all the calculations yourself. And you can check with the website, with Unisor.com, the numbers which you will receive. It's just a very good and useful exercise for you. Well, that's it for today. Thank you very much and good luck.