 Hi, I'm Zor. Welcome to Nezor Education. I think whatever I'm going to discuss today, basically you should know this from the theory of probabilities course which obviously supposed to be completed before we go to statistics. Still, I am within this foundation of statistics theme and I believe it's quite appropriate to talk about averages as main tool in statistics. Well, this lecture is presented on Unizor.com as part of the course advanced mathematics for high school students. I do suggest you to actually view this from this website rather than from the YouTube or anywhere else because the site contains functionality, it contains notes for every lecture which basically can serve as a textbook. So, I do suggest you to go directly to this website and go to statistics and this is the foundations of statistics part and we're talking about statistical averages here. So, as I was saying, statistical averages are very important tool of statistics and basically in this very short presentation I would like to again remind you why this is the case. Now, I'm saying again because this is actually a repetition of the knowledge which you have from theory of probabilities. Alright, so, what are we talking about? Let's say we have some random variable XE which distribution we don't know and the purpose of statistics is based on some observations of this particular values of this random variable make some judgment about its distribution. Okay, now, what are the main characteristics of distribution? Well, traditionally, people are very much concerned with the mean value, the mathematical expectation of this random variable XE and its variance. Now, variance is basically a measure of how wide the values of this random variable are spread around its mean value. Well, it's not always true because, for instance, if you have a Bernoulli random variable which takes values only 0 and 1 with some probabilities p and 1 minus p, well, then let's just make it even simpler. Let's say it's 1 half and 1 half. Then the mean value will be 1 half. But this Bernoulli's random variable will never take the value of 1 half. But it's not always correct to say that the values of random variable are concentrated around its mean value and the variance basically describes how tightly it is concentrated around mean value. In this case, it's not true, basically. But these are rare cases. In much more often occurring cases in real statistics, situation is really like I was just saying that there is some mean value and the values of the random variable are concentrated around this mean value and the variance is the degree of this concentration. If the variance is very big, then it's spread around wider. If the variance is very small, then the values are concentrated closer to the mean. And this is very important because let's consider you have a very wide variance. And then you have one single value of this random variable. Does it tell you anything about what is the mean value, where the values are supposed to be concentrated around? Well, in case of a very big variance, this is not really a valid point. You cannot say that the value which you have obtained, observing the random variable, signifies anything about the value of the mean of this random variable. However, if it's really concentrated around this mean, if the variance is small, then it does make sense to take any particular value which you have observed as some kind of a reasonable evaluation of the mean value. So again, the values which you observe are supposed to signify something about the mean value. And the variance of the values which you have observed, let's call it a sample variance, signifies how close your evaluation actually is. Now, with this, some kind of introduction, let's talk about a very concrete example. So you have a random variable, the distribution of which you do not know, but you are interested to know the mean value and the variance, or not the real values, but at least the approximation of these values. So how can that be arranged? Well, obviously you conduct experiments and you have some values. You have n experiments and you have n values of this random variable. Well, if you just look at these variables, at these values and do nothing, it might actually tell you something about the very, I would say, not very precise, let's put it this way, not very precise evaluation of the value of the mean value and the variance of this particular random variable. You really need to do some calculations to do it more mathematically, so to speak. So I'm going to introduce you to these calculations, which are very simple, and then we will spend some time discussing why this particular calculation is very important. So obviously we all know that the best way to approximate the mean value of this random variable, c, if you have n of its observed value, is a arithmetic average of these values. Let's call it x with a bar on the top. And the statement which people usually do is, okay, the average of these values is this and that signifies basically relatively precisely or whatever the measurement of this precision is, that this probably would be very close to the mean value of c. Well, question is why? And that's what I'm going to discuss right now. Now, instead of considering n experiments with one random variable c, let's consider that these experiments are independent from each other, and this is a reasonable assumption. And also let's consider that conditions of the experiment are not changing. So these x1, x2, xn are actually values of certain random variables, c1, c2, et cetera, cn, which are independent of each other, and because conditions of the experiment are exactly the same, the distribution of these is exactly the same as this one. So the distribution of these, which means all the characteristics of distribution, in particular, you can talk about mean value, you can variance, et cetera. So each one of these random variables is exact copy of c, and they are independent of each other. So what do we have? We have x1 as one particular value of c1. x2 is a particular value of c2, and xn is a particular value of cn. Now, what is this? Well, let's make another random variable c with a bar on the top, which is average of all these and random variables. So I can actually say that this is one single value of this random variable. Right? Right. Now, why is it better to consider one value, this one, of this random variable rather than an independent value of this variable? Here is why. Let's just examine this particular random variable, xc with a bar, the average of xc. So first of all, let's talk about its mathematical expectation. Let me just call it eta, so I don't have all these bars, numbers, et cetera. Okay? Now, what is this? Well, back to theory of probabilities, we know that mathematical expectation of sum of random variables equals to sum of their mathematical expectations. So basically this is, and obviously the factor can be extracted out, so it will be 1 over n, and I will have expectation of c1, et cetera, plus expectation of cn. Okay? Now, as I was saying before, c1, c2, cn are exact copy of c. They are independent random variables, but their distribution is exactly the same, which means their mean values is exactly the same as the mean value of c. So that's equal to 1 over n, and here I have n expectations of c, right? Which means expectation of c. So what's important is that mathematical expectation of the average of these n independent and identically distributed random variables, c1, cn, is exactly the same as the expectation of our original random variable. Well, this is good. Now, question is, is this single value which we have obtained observing this particular random variable, is it close to my mean value? So this is basically described by the variance, but here is the most important difference between c and n. Let's say sigma square is variance of c. What is variance of eta? Well, again, since we know from the series of probabilities that the variance of a sum of independent, now here is important property of independence. That's where it's playing, actually. In mathematical expectation it was not important, but with variance it is important. So sum of independent random variables, the variance of this sum is equal to sum of the variances. And if you remember, variance is basically the average of deviation of the mean squared, which means that this number n can be brought out in square. And here I have sum of variances, variance c1, plus, etc., plus variance of cn. Equals to, they are all independent and identically distributed as c, which means their variance is exactly the same as c, which means it's n times sigma squared divided by n squared, which means sigma squared divided by n. And this is the most important property of the averaging. The variance of the average of n independent identically distributed random variables is n times smaller than the variance of the original random variable. Now, what does it mean? Well, it means that this value is much closer to the mean value of eta than any of these guys to the mean value of this, which is the same as mean value of eta, which is whatever I call mu. Which means this is a better evaluation, a better estimate of the mean value of eta, which is exactly the same as mean value of this, than each one of these guys. And that's the reason why we are using this averaging. If you have n observations and you average them out, then the resulting average will be much closer evaluation of the mean value than any of these considered separately. Yeah, that's why average plays such an important role. Now, what also is important is that as n is increasing, my variance is decreasing, which means the values are tighter and tighter concentrated around my mean value. That's number one. The second very important reason, which is much less trivial, is that I was talking about so-called central limit theorem in the theory of probability scores. Now, it basically says that the sum of random variables in some way resembles the normal distribution. And the more components you have in this sum, the closer the distribution of the sum is to a normal distribution. So, I can actually consider the variable eta to be almost normally distributed. And normally distributed random variable, you remember this bell curve, right? This is mathematical expectation. And if you have a square root of the variance, in this case, it's sigma divided by square root of n. Then something like minus 2... it's mu minus 2 sigma divided by square root of n. And this would be mu plus 2 sigma divided by square root of n, right? So, this is double 2 sigma divided by square root of n. It's double standard deviation. Now, we know that the probability to be here, to be around the mean value mu, is something like 95% approximately. So, right now, what I can say that knowing this gap and knowing, again, approximately, because I cannot really know it exactly, I know approximately, and I also know approximately mu, which is this one, I can say that the concentration of all the values of this random variable are within our approximation of mu minus whatever approximation of my standard deviation, or plus with a certain two-level of 95%. So, now we are actually getting something very tangible. We can calculate this one and the greater n is, as we know, the tighter this value is to the mathematical expectation of sigma. It's a much better evaluation of my mu in this case. The greater the n is. Also, using these values, x1, etc., xn, I can always evaluate my sigma by basically using something which is called a sample variance. Now, sample variance is a mathematical expectation of square of deviation of the random variable from its mean. Now, I don't have random variable. I have n of its values. I don't have mean. I have an estimation of mean, but I can still make the same calculations, which means x1 minus x2 plus, etc., plus xn minus average value divided by n. So, this is my average square deviation of the values from average value. So, these are not old values my random variable takes. And this is not exactly my mean value for this. However, it's a good estimate. And in some other lecture, I would provide basically some calculations how good this estimate is. But it's an estimate, which you can use instead of sigma square. Now, this you can use instead of mu. So, you have basically all components. And that's what allows you to evaluate what exactly the distribution of... See, maybe not the entire distribution, but at least its mean and variance are. And where we can expect the values of this random variable to be. All right. So, basically that's it. I just wanted to make some, I would say, statistical foundation of using averages before going into any more detailed statistical researches, et cetera. Average, again, my point is that the averages in statistics play extremely important role. And I was just trying to give you the foundation why. The reason why they're playing such an important role. This is basically why. Everything is concentrated here. And this end in denominator. That the more data you have, the more precisely you can evaluate your mean value of this guy. All right. That's it for today. Thank you very much and good luck.