 Hi, I'm Zor. Welcome to Unisor Education. We are continuing this advanced mathematics course for teenagers presented on Unisor.com. This lecture is a part of it and it's dedicated to theory of probabilities. In particular, I would like to address a couple of problems, very, very simple problems in theory of probabilities. I do suggest you to go to the website and try to solve these problems just yourself. So this is dedicated to certain very simple properties of random variables. So this is problems number three for random variables. So again, it's very, very simple and what's important is that there are certain implications of these very simple problems related to statistics and that's why I would like actually to really address it as a lecture. Although, again, it's so simple it doesn't really deserve to be a lecture. Anyway, so the first problem is you have a random variable which takes values x1, x2, etc. xn with corresponding probabilities p1, p2, etc. pn. Let me just repeat that for discrete random variables which take only certain finite number of values with certain probabilities, the probability basically means that if I make an experiment with this random variable and take its value, if I will conduct certain number of experiments, then the relative number of results of x1 for the value of this random variable will be approximately a fraction of p1 times the number of experiments. And as the number of experiments goes to infinity, the closer and closer this statistical frequency of occurring the value of x1 will be to p1 and correspondingly with the same others. I'm just saying it just so you don't really forget what really the probability in this particular case is. It's more of an intuitive understanding of probabilities, not a formal definition with measure theory and stuff like this. Alright, so we have this random variable. Now, it has its own expectation or mean value which is basically a weighted average of different values with probabilities as weights. And I did prove that again using the statistical definition, the frequency based definition of probabilities, indeed the average value of the certain experiments of the random variable c would actually be closer and closer to this value as the number of experiments goes to infinity. Alright, now my question is, what if I will consider instead of this random variable, another random variable, A is a constant. Now, intuitively it's obvious that if my random variable has certain average or mean value or expectation, then the random variable which is obtained from the same one by multiplying by constant with correspondingly values A times x1, A times x2, etc., A times xn with the probabilities the same because I'm not changing the nature of the variable, I'm just multiplying by constant. Intuitively it's obvious that the expectation of this variable should be equal to this. The same constant multiplied by the expectation of original variable c. Intuitively it is obvious, but let's just prove it very simply. Here it is. Since new variable eta takes these values with these probabilities, then according to a definition of the expectation, its weighted average of these values with weights are probabilities, right? Obviously we can factor out the multiplier A and what will be in parenthesis. This is exactly the expectation of the c. So A times E of c. Now, as you see, the proof is absolutely trivial. However, it's always good to have it and this is basically the property of expectation. I did prove before that, for instance, expectation of the sum of two random variables is the sum of their expectation. Well, now we know that we can multiply by a factor by some constant and again expectation is really behaving exactly like a linear function. So basically expectation is a linear function of random variable which means that any linear combination of linear variables results in the corresponding linear combination of their expectation. Okay, so that's what number one. Number two, problem is analogous, but we will now talk about variance. So again, we have this variable c which takes x1, etc., xn with probabilities p1, etc., pn. Now I am interested in variance of A times c and my theorem is that this is A squared times variance of c. Now again, let's talk about intuitive kind of understanding of this. Now variance is, as you remember, a weighted average of squares of deviation from the mean value from the expectation. So this square actually makes this A squared here. But let's just do it mathematically. So let's have this symbolic for expectation. So expectation is new and this is, as we know, x1 times p1, etc., plus xn times pn. But this doesn't really matter. Now what matters is that the variance of c is equal to, again, weighted average of squares of deviations from the mean. So it's x1 minus mu squared times p1, x2 minus mu squared times p2, xn minus mu squared times pn. So that's what variance of the variable c is. Now what about A times c? Well, we know that expectation of is A times mu. A times expectation of c. We just proved it second to go, right? So variance of A times c is equal to, again, it's squares of deviation. So x1 minus, sorry, the variable A xc takes values A x1, A x2, A xn, etc., right? So it would be A x1 minus expectation, which is A mu squared p1, plus A x2 minus A times mu p2, etc., plus A xn minus A times mu squared times pn. Now obviously, you factor out A from every parenthesis, but since it's square, it would be A squared times x1 minus mu squared p1, plus, etc., plus xn minus mu squared pn close square bracket. And this is the variance, right? So that's exactly what we have just proven. The variance of A times c is A squared times the variance of the c. Again, the proof is elementary. There is absolutely nothing ingenious about this type of proof. But, however, I would like to make a point that the expectation and variance can be taken from a constant multiplied by our random variable, and that's how the expectation and variance of the new variable is expressed in terms of expectation and variance of the old one. So, expectation is multiplied by the same constant, and variance is multiplied by this constant square. Now, how about standard deviation? That's the third one, right? Well, this is simple. I'll just use the definition of the standard deviation as the square root of the variance, right? Now, we know about the variance. Okay, now I will make a mistake. Since square root, this is A squared, and square root of variance, this is standard deviation, right? Wrong. This is what's right. Because square root of A squared is not A. It's absolute variable A, because A can be negative, right? Now, whenever we were doing this variation, variance multiplications, etc., it was always square. So, that's fine. It doesn't really matter whether it's positive or negative. Now, we have to really worry about this. So, this is the correct formula. So, the standard deviation of A times sigma, xc, sorry, standard deviation of A times xc is absolute variable A times standard deviation of xc. So, I hope you are very careful. This is the correct. This is incorrect, right? Okay. So, basically these are three properties of a constant multiplied by a random variable. It's expectation multiplied by the same constant. It's variance, but then it's multiplied by square root constant. And standard deviation, it's multiplied by absolute variable of this constant. Now, when I've done that, I would like to address the following. Now, what is this? This is basically something which all the people are doing when they are thinking about statistics. They are making, for instance, a measurement. Let's say you are measuring a table with a ruler, right? Now, you measure it a few times. Different people with the same ruler and the same table are measuring a few times. And you might have slightly different results. You know, some millimeters here, millimeters there. That's actually an error of your measurement. It depends on the ruler, how precisely these centimeters and millimeters and inches and quarters of an inch are actually marked, etc. So, there are some errors. Now, what does it mean? It means that every measurement has, I is measurement number. It's basically some real value, and there is some real value, which is the length of the table, right? Let's call it, you know, plus some kind of an error. And different, you know, measurements have different values of this error. So, it can be, let's say, a real value for the table length is one meter, right? So, that's maybe one meter and one centimeter, which is one or one. Or it can be 99 centimeters, or it can be 99.5 centimeters, or 100.5 centimeters, etc. So, there is some kind of an error. Now, how can people eliminate the error, or at least reduce it? So, what they do is, they measure it twice, twice, they measure it a hundred times, and then they're averaging the results. They're thinking that if during one experiment they made one error, then another experiment will make another error, and then another error, and all the errors combined maybe will nullify each other, and on average, we will get the correct readings of the lengths, right? By the way, I'm doing the same thing when I'm measuring my blood pressure. I know that these apparatuses are not really precise, so I'm measuring a few times on different arms, and then I'm just averaging the results. And that, I think, is more or less nullifying, not exactly nullifying, but reducing the error which is made during every particular measurement. Now, does it have sense? I mean, is it really right to do it this way? Do we really reduce the error? And here is what I would like to address right now. Let's just think about it. First of all, we are assuming that all these represent independent and identically distributed random variables. So, if these are measurements of the same table with the same ruler by the same person, then, well, it's a good assumption that we are more or less independently making measurements, and the result of the measurement itself might be really identically distributed. It's a reasonable assumption, okay? All right, now, what can we say about the average? Well, the average of this thing is equal to the expectation of their sum divided by m, right? We know that if there is a constant multiplied by something, then this constant can be just taken outside of the expectation. And this expectation, the expectation of sum, I didn't prove it for n, I proved it for two random variables, but obviously it can be expanded using the induction to a sum of any number. So it would be n times expectation of any one of those, because they are identically distributed divided by n, which is mu. I assume that expectation of c is mu, and variance of c is equal to sigma square. I'll just write it a little bit more accurately, sigma square, and this is mu. Now, you understand why this is square, because variance is a weighted average of squares of deviations from c to mu. Okay, fine. So, first of all, that's what we have. That's good. Which means averaging and experiments, we are getting the random variable, which has exactly the same expectation, the mean value, as the original one. So we are not really deviating by averaging and different identically distributed and independent random variables. We are not deviating from the mean value of each one of them. The result has exactly the same mean value, and that's to be expected. I mean, you're just averaging the same distribution, basically, you're averaging. Now, how about variance? Okay, variance, first of all, you have this constant, which you are supposed to take out from the variance, as square, as you remember, right? Now, inside, you will have variance of their sum. Now, recall that I actually demanded the independence and identical distribution of all these variables, and if you remember one of the previous lecture, the variance of the sum of independent random variables is equal to sum of their variances. So, what I have right now is n times variance of C1, which is sigma squared divided by n. This is extremely important. You see what happens? The variance of the sum of n, the average of n random variables, is n times smaller than the original variance, and that's exactly what we're looking for. So, whenever, if variance is a good measurement of the error, which we are making by measuring, actually, the standard deviation would be even better. So, if you have a standard deviation of eta, it would be sigma divided by square root of n, right? It will be square root of variance. So, the standard deviation is a good measurement of precision, which you are measuring certain things, right? So, if original precision was having standard deviation of sigma, which is, again, it's average deviation from the mean value, then the sum has the standard deviation square root of n times smaller. That's exactly why people are averaging together many different measurements of the same thing to get a more precise evaluation of the mean value, because the mean value is basically the base from which you are deviating by making certain errors. So, the errors combined together are nullifying each other, and the error of this average of n different measurements is square root of n smaller than the error of each individual measurement. That's why people do twice and twice and whatever number of times there is even an old Russian proverb that you have to seven times measure and one time to cut. Yes, if you want to cut it precisely, you have to really measure a few times, and then, well, it's not said in the proverb, but basically you have to really average the results of your measurement. You have to really make it more precise, and to make it more precise, you just make it a few times and average the results. The more times you are measuring the same thing and averaging the results, the more precise would be the result. It will be closer to the mean value, which is supposed to be the real length of the table or whatever the measurement, blood pressure, whatever other measurements you are making. That was my very important point that this averaging is reducing the standard deviation and the variance of the resulting random variable. That's why whenever we are doing this, the result will be also a random variable with the same expectation but with a much smaller deviation from this expectation, and that's why we are expecting the results to be much closer if we are making this average than any original individual measurement. Okay, that was it. I actually put some comments on the website Unisor.com for this lecture, and I would suggest you to read these comments again. These are notes for every lecture. It serves like a textbook, basically. That's it for today. Thank you very much and good luck.