 Hi, I'm Zor. Welcome to Unisor Education. Today we will talk about statistical distribution. This is part of the Advanced Mathematics course for teenagers and high school students. It's presented on Unisor.com and that's where I suggest you to watch this lecture from because the site has, for every lecture, it has detailed notes, which can be used just as a textbook, plus the site has functionality like enrolling, for instance, or taking exam, etc. So I do recommend you to go to unisor.com to watch this lecture. Now, statistical distribution. I have spent some time in previous lecture explaining what actually statistical distribution is. Well, basically in a couple of words is you have some random variable and you don't really know any distribution of probabilities of this random variable. Sometimes you know the values, but you don't know the probability. Sometimes you don't know even the values which this variable can theoretically take. So your task is, based on whatever the experiment results you have, whatever the empirical data you have at hands, evaluate the distribution of probabilities of this random variable and, well, using this gives you some picture about its future behavior, because that's what probabilities basically is about. So I have subdivided this relatively big task into smaller ones and today we will consider one particular task, which I call task A. The task when I have one particular random variable, which I would like to evaluate and I know in advance what kind of values it can theoretically take. For instance, if we are talking about rolling the die, we know that it's either one or two or three or four or five or six. That's the values which this random variable can take. So we know the random, the values of random variable, but we don't know the probabilities because maybe it's ideal die, in which case each probability is equal to one-sixth. Or maybe it's not exactly an ideal die and I would like to statistically prove or disprove that this is or is not ideal die. So that's an example. So, okay, so this is our problem. Evaluate the probabilities of known values which our random variable can take using the past experience. Now, what is the past experience? Well, very simply, we basically arrange the experiment with this particular random variable. Let's say we have arranged and different experiments. So we have n results. Now, we know that these are different values which it can take. So let's assume that certain number of times out of this n new one, the variable took value x1, new two times out of these n, random variable took x2, etc. And new k it took the value xk. But obviously that equality holds. So we have divided n results into k groups. These are results when the variable took x1, this is x2, and these are the number of results which it took, the very xk. Now, question is how can we use these numbers to evaluate the probabilities unknown to us, p1, p2, etc. pk. So these probabilities we do not know. Okay. Well, the first thing which comes to mind is let's use empirical frequency which is, for instance, x1 it took new one divided by n times, right? Well, this is empirical frequency of occurring value x1. And most likely that's the best evaluation of the probability p1, right? Because what is the probability? The probability is in at least the most commonly acceptable definition is some kind of a limit. This frequency is tending to as the number of experiments goes to infinity, right? So that's kind of a assumed understanding of what probability is. It's not really mathematical. It's kind of more logical, I would say definition of probability. So that seems to be like a good evaluation. So our task is, right now, number one to evaluate whether this is an unbiased evaluation, which means that considering the new one is actually a random variable, considering, you know, Xi's random variable, new one and new two, etc. They're all randoms, right? So considering this, question is whether the expectation of this is equal our p1 and it will be equal. So that's fine. Much more difficult problem is to evaluate the margin of error. Now, it's intuitively understandable that as n is increasing, the precision of this evaluation should also increase, which means it should be closer and closer to p1. Now, how close? That's what margin of error actually is talking about, right? So we will have to evaluate the margin of error. All right, so we can kind of methodize our problem. And I will be talking mostly about p1 because everything else is exactly the same, right? Okay. So what can we actually do about evaluation of this particular random variable? Well, here is what I suggest. Let's take another random variable beta, which is equal to 1 if Xi is equal to x1 and 0 if Xi is not equal to x1. So this is Bernoulli random variable and we have already spent some time analyzing the Bernoulli's random variable. And I do suggest you, if you don't remember this, go back to previous lectures. It's one of the previous lectures about Bernoulli distribution and statistical analysis of Bernoulli random variables. So I'm introducing this Bernoulli random variable. Now, what's the probability of? Now, the probability of beta equal to 1 is exactly the same as the probability of Xi equals to x1, which is p1, unknown to us, p1, right? Okay. Now, let's talk about this, our empirical frequency. Let's think about it. If we conduct our experiment n times, we have random variable Xi taking certain values. At the same time, random variable beta takes certain values. Which ones? ones or zeros? How many times in our n experiments beta took the value of 1? Well, that's the same number when Xi is equal to x1, which is nu1, right? Okay, so in our n experiments beta took nu1 times value of 1 and corresponding all other, which is capital N minus nu1 times, it took value of 0. Now, each experiment we can actually consider as a separate random variable which is independent of b and independent among themselves and identically distributed as beta. So I can have beta 1, beta 2, etc., beta n. That's how we usually model statistics, right? And experiments result in n different random variables, which are, they are independent and identically distributed exactly the same as beta. And then what I have, I have some number of them, exactly nu1 number, which happened to be 1. And the rest is 0. So their sum divided by n, which is their average value, is equal to nu1 divided by n. Which is exactly what we want to use as a variation of our probability P1. So right now, basically instead of talking about nu1 divided by n as a random variable, which is an approximation of constant P1, we can talk about this as approximation of P1. Now, this is much more familiar territory because this is something which we have already considered before in the lecture when I'm talking, I was talking about random variables with Bernoulli distribution, right? So we know a lot about this. So what do we know first? Let's just think about it. First of all, expectation of our empirical frequency, which is expectation of this, which is they're all identically distributed and they're all independent. So the expectation of their sum is sum of their expectations, which in turn is exactly which is an expectation of each one of them, which is expectation of B and B, I'm talking, sorry, beta, not B. And you remember, this is equal to 1 if c is, that's equal to 1 and 0 is if c is not equal to x1, sorry, x1, right? So this is P1 and this is obviously 1 minus P1. So the probability of beta, 1 or beta 2, etc., to be equal to 1 is exactly P1. So this thing is equal to P1, P1 times n divided by n. So what does it mean? It means that our variation is unbiased, which is very good. Now, another thing is what's the variance? Well, we know that variance of this is equal to 1 over n square times variance of beta, right? Because each one of them has exactly the same variance as beta. Now we know from the Bernoulli variables that this is equal to 1 over times n obviously, right? Because there are n of them. So the variance of beta is equal to P times 1 minus P, right? Well, actually P1. Now, right from here, we see that the precision is increasing as n goes to infinity, right? Because these are constants, they're unknown constants, but they're constants. And n is increasing, right? So in any case, our evaluation of P1 as new 1 divided by n, the empirical frequency of occurring value x1 is number 1, unbiased because of this. And number 2, precision is getting better and better as the number of experiments is increasing. Well, the only thing is we have to evaluate how good this particular precision is. We have to evaluate the margin of arrows, right? Now, the margin of arrow can be evaluated in a crude way. Now, you probably remember again from the Bernoulli lecture that P times 1 minus P is a function which has such a graph. So in case P is equal to 1 half, it reaches the maximum. It's a parabola, right? It's a parabola with horns down and in the middle it has the maximum value, right? And the maximum value would be when P is equal to 1 half, which is 1 quarter. So I can always say that this is less or equal to 1 over 4n, from which I can say that standard deviation of new 1 over n, which is square root of this, is less or equal to 2 square root of n. And from this, knowing the standard deviation, I can always devise the margin of arrow. Why? Well, because you see, this is an average of independent random variables, right? Which means we can use the central limit theorem and say that this is approximately normally distributed random variable. And for normal variables, we know that for instance with certain two level of 95%, we have margin of arrow not exceeding 2 sigma. So margin of arrow with 95% probability, or certainty rather, would be within 2 sigma, which is 1 over square root of n. Which is good. Well, let's just consider how good it is. Let's say you have a die, and you have, let's say, 100 rolls, right? You're rolling it 100 times. So n is equal to 100, which means that 1 over square root of n times 2, because it's 2 sigma, would be 1 tenths. So what you can say is that for instance, out of 100, you had 15 times number 1. So, which means that 1500 is empirical frequency of number 1 on the die. And your precision is one tenths. Well, is it good? One tenths, or actually it's 10 hundreds, right? Which means it's from 15 minus 10, which is 5 to 25 hundreds. So that's the probability. Is it a good evaluation? Well, absolutely not. I mean, if we are saying that the probability of one is from one, rolling the one is from 120 to one quarter, that's too wide an interval. We need one sixth, right? Because that's what ideal die mean. So what I mean is that this is, well, it's an estimate of the margin of error, but it's too crude. We would like to have a little bit better approximation. So what's the better approximation? Well, you remember we were talking about sample variance. Okay, so let's talk about sample variance, and maybe we will get a better variation. So let me try to go through the calculations of sample variation. By the way, all the calculations are in the notes for this lecture. So if I will make a mistake, I'm sorry, but in the notes it seems to be correct. So let's try. Now, you remember that if we have, let's go from generality. So you have, for instance, some random variable eta, right? So you have experiments. You have average value. Let's call it zeta-ramnets equal to, that's average, right? So the sample variance is you take this minus zeta-square plus the second minus zeta-square plus et cetera plus the last one. Deviation from the last one from empirical average. And you divide it by n minus 1. Remember this formula? So these are random variables. These are the average. So these are values, basically, which we have received from the experiments, from n experiments. And these are the evaluation of their expectation, actually, which means it's arithmetic average. So you have sum of squares divided by n minus 1 to make the evaluation unbiased. So that's what we're talking about, right? Okay, so let's talk about random variable beta, which we have. So what's the random variable beta? We have b1, et cetera, beta 1, beta n, results of experiments. Now, what is the average? Beta 1 plus et cetera plus beta n divided by n is this, right? These are 1 or 0, depending on whether our initial variable took the value of x1. So this is the number of times out of n, it took the value of x1, which is nu1 divided by n. And we know this, right? So what we have to do is we have to summarize beta j minus nu1 divided by n square, summarization for j from 1 to n, and divided by n minus 1, right? So that's our statistical evaluation of variance, sample variance, right? So what is it? Let's just calculate. So this is evaluation of beta. Okay, now we know that out of these n, beta took value 1 nu1 times, and value 0 n minus nu1 times, right? So I can say that this is equal to, we have nu1 times, it took value 1, and n minus nu1 times, it took value 0. So that's our sample variance of beta. Well, let's just open the parentheses. 1 over n minus 1 nu1 times 1 minus 2 nu1 square divided by n plus nu1 and nu1 square, so it's nu1 cube divided by n square plus, well, 0 minus, so that's nu1 square divided by n square, so it's nu1 square divided by n minus nu1 cube divided by n square equals. Now this is reduced with this, minus 2 and this would be 1 over n minus 1 nu1 minus nu1 square divided by n. Okay, fine. Let me write it a little bit more neatly equals 2. I would like to have the common denominator, so it would be n times n minus 1, and on the top it would be n times nu1 minus nu1 square, so it would be nu1 times n minus nu1. So that's the variance of beta, right? Now, okay, what we are interested is variance of this, which is an evaluation, an estimate of our probability p1, right? This is approximately p1. So what's the variance of this? Well, if we know the variance of beta, then the variance of sum is equal to sum of variances, right? n goes out with a square, so the variance of nu1 over n is equal to, this should be multiplied by n and divided by n square, so basically it would be this. And this is a variance of our evaluation. Now, from the variance we can always go to standard deviation, which is square root of this. Well, n can go out, square root of nu1 n minus nu1 divided by n minus 1. So that's my evaluation of sigma. And obviously again from rule of 2 sigma, we can say that 2 sigma would be a margin of error with 95% probability. Just out of curiosity, let's calculate what will be in this particular case of the die when I have 15 times occurring number 1. So I will have 1 over 100 square root of 15 times 85 divided by 99. So what is this? I didn't do the calculations, but approximately it's 4 times 936, so it's 6 divided by 10, so it's approximately 6 1000s. Okay, now you remember it used to be 110, our margin of error, right? Well actually I have to double it, it's 2 sigma, so it's 12 1000s. So it used to be one-tenths. Now it's 12 1000s, which is much better, obviously. Now with this evaluation, with this value of 2 sigma, I can say that it's 15 100, my probability, plus minus 12 1000s. Well that's better, which is 12 1000s, just a little bit bigger than 100, so it's approximately from 14 to 16 100. So that's my probability, which is much closer to 16. I mean obviously. So this precision is much better than the precision when I was using just that crude evaluation with 1 over 2 square root of them. Well basically that's it, and I do recommend obviously to go through a little bit more calculations to get a better evaluation of the margin of errors. But what actually have we accomplished is the following. Based on certain empirical information, the results of experiments with our random variable C, which can take one of these K different values. So based on the statistics which we accumulated, new 1, new 2, etc., we can evaluate the probability, which is obviously an empirical frequency, but not only we evaluate the probability, we also evaluate the margin of error using this sigma in 2 sigma sign. So using these numbers which we have, the experimental numbers, the total number of experiments, and the number of times it took the value x1, we evaluate the precision of new 1 divided by n as an estimate of the probability. Well obviously whatever I said about new 1 can be repeated for new K. So that's the methodology. You have to, number 1, accumulate statistics. Number 2, you have to evaluate empirical probabilities just basically dividing the number of times each value occurs to the total number of experiments. And then using formula like this, you evaluate the margin of error. What exactly the probability, what's the range of probabilities which you have evaluated using this methodology. Well that's it. Thank you very much. I do suggest you to go through the lecture again through the notes. It's very, very helpful. So other than that, good luck.