 Hi, I'm Zor. Welcome to Unizor Education. This lecture is continuation of theory of probabilities. We will devote some time talking about normal distributions. This lecture is a part of advanced mathematics course for teenagers, which is presented on Unizor.com website. And that's where I suggest you to watch this lecture from, because the website contains notes which are basically like a textbook for you. So it's very beneficial for you to basically work with both the video presentation and the text. Now back to probability theory. Last lecture was about normal distribution. And I told you that I have a little problem describing the details of normal distribution, because on one hand this is, well, arguably the most important probability distribution in the theory of probabilities. And on another hand, it's a continuous distribution and the apparatus, the mathematical apparatus is basically calculus and relatively advanced calculus, which I don't want actually to touch right now for this course. It's basically, it belongs to higher education, to universities, etc. So I'm trying to present the normal distribution of probabilities in a qualitative light, basically through examples maybe and some maybe more detailed explanations without rigorous mathematical derivation of certain formulas. So the last lecture was a definition of what normal distribution is about. And again, I told you this is probably the most important distribution because, and that's very interesting, because, well, I can say because of the central limit theorem. So there is a central limit theory, which is very, very important in theory of probabilities, which basically states that if you will mix together many different random variables, their sum behaves as a normal distribution, basically, normal variable. More precisely, we are talking about averaging of a certain number of random variables. Then there are certain conditions when this theorem is actually true. A very simple and sufficient condition for the central limit theorem is if we are averaging and identically distributed independent variables, we are averaging them. And the new random variable behaves more or less approximately, and there are certain mathematical criteria, obviously, of this approximation, so to speak, behaves more or less like a normal random variable. The random variable is normal distribution of probabilities. So this is a sufficient condition. So all of them are identically distributed and independent variables. And the theorem states that their average is as close to normal distributed random variable as possible whenever the n increases to infinity. Well, this is a statement, and I couldn't really prove it because, as I was saying, the mathematical apparatus goes beyond the scope of this particular course. However, I would like to illustrate it. So what I'm going to do right now is I would like to illustrate this central limit theorem in one particular case. The case which I would like to consider is if all our random variables are very, very simple, maybe the simplest random variables, results of Bernoulli experiment with the probability of success and the failure, obviously, the same. So this is our random variable. So c, i, t is equal to zero with probability equal to one-half, and it's equal to one with probability equal to one-half. Now, in practical terms, let's say you're flipping the coin, the probability of heads and tails is one-half. So you are adding together all your, let's say you associate zero with tail and one with heads. So you're adding together basically all your results, and that means you're adding one for each head. You are averaging divided by the number of experiments you have conducted, the number of flips which you made, and this is your new random variable eta. And then you are basically experimenting either once or twice or a hundred times in a row, and the more times you experiment, let's say you experiment with million coins, right, in one shot. You flip the million coins and you have the result. Then you flip them another time and you have the result. You add all the heads and that's the result that you have the number. So the number for a million coins will be in the range of from zero when all million coins are on the tail, or a million when all of them are heads. Now, both extremes are very, very rare probabilities. Now, the something like half a million for this particular sound can occur much more frequently because it doesn't require a particular result on every experiment. For instance, the first half of the experiments can be tail and the second half can be heads, or vice versa, or every second should be head and every another second, every odd and even can be correspondingly tails and heads. And in all these cases, the result will be half a million, right, because you will have exactly half a million heads and half a million tails. And there are many other variations, obviously. And the more combinations you have to basically get that half a million, the greater the probability of that half a million will be. So I can say that the distribution of probability is some kind of a graph which has values from zero to a million with different probabilities. And that's what I'm going to draw right now for the first few cases. And I will show that this graph actually more and more resembles the bell curve, which is the characteristic property of normal random variable. So that's my purpose for today. Right, let's do it. Now, what about graphical representation? Graphical representation will be very simple. 0, 1, 2, 3, 4, 5, 6, etc. Now, so my variable eta, which is equal to sum of Bernoulli experiments and Bernoulli experiments divided by n. So let's consider n is equal to 1. So what are the values of my eta? Well, in this case, eta is c1 divided by 1. So it's c1. And the values are 0 and 1. The probabilities are 1 half and 1 half, right? So the graphical representation will be like this. So this is 1 half. And the graph will be actually, I don't really need this. It will be dotted line. Okay? So for the value of 0, the probability will be 1 half. For the value of 1, the probability will be also 1 half. And there are no other values this particular variable can take. It's only either 0 or 1. And I have agreed before to graphically represent it as this type of combination of rectangles, which have one length, one unit as a base, and the probability as a height. All right, so that's simple. Great. Let's move on. n is equal to 2. So my eta is equal to c1 plus c2 divided by 2, where c1 and c2 are either 0 or 1 with probability 1 half. So what can we have here as for values? Well, the values are, if both of them are equal to 0, then the value will be 0, right? And the probability of this is 0 and 0. So two independent variables. And I'm asking for the probability of 1 being equal to 0, which is 1 half, and another being equal to 0, which is also 1 half. And I need simultaneous. It's two different coins, simultaneously going, falling on tails, right? So it's 1 half times 1 half, because we need the event, which is a combination of two elementary events, which are independent of each other, so the probabilities are multiplying. So it's 1 half by 1 half. So it's 1 quarter. Then, next value is 1. One can be either when the first is equal to 1 and the second is equal to 0, or the first is equal to 0 and the second is equal to 1. So we have two different variations. We have 1 0 or 0 1. The probability of this is 1 half times 1 half, 1 quarter, probability of this is equal to 1 quarter. 1 half of this times 1 half of this. So together, when the sum is equal to 1, I have two elementary events, which are supposed to be summed up, and the probability will be 1 quarter plus 1 quarter. It will be two quarters, right? So again, it's not this, it's this. So this is two quarters. And finally, my value can be equal to 2. I'm talking about sum. Forget about dividing by 2. So my value can be equal to 2 when both of them are equal to 1. So it's 1 half for the 1 and 1 half for another. So it's again 1 quarter. So that would be this way. That's for 2. Now, the values are divided by 2 actually, which means that the whole graph should be squeezed by the factor of 2. But it doesn't really change the shape of the graph. That's why I decided just to go with the sum, c1 plus c2, rather than c1 plus c3 divided by 2. But I would like to illustrate that the shape is something like this already. Now, it doesn't really resemble the bell curve, bell curve yet. But you already see that there is something like higher, middle, and lower both ends, right? But let's continue on and let's increase the number of experiments. Let's go to m is equal to 3. Now, before I go any further, let me remind you that it's very easy to calculate the formula for eta is equal to k, where eta is equal to c1 plus c2 plus et cetera plus cn. Now, if these are Bernoulli experiments with values of 0 and 1, with certain probabilities, p and 1 minus p, which is q. Now, you should remember that to find the value, the probability of the value of eta to be equal to k, you basically have to understand that there are k 1s among them, and other n minus k are 0s, right? But that can be in any kind of a sequence. So, basically, it's a number of combinations from n by k that gives you certain concrete distribution of 1s among all these times. You have 1s k times. So, the probability should be p to the ks degree and q should be to the n minus k, power n minus k. So, that's the formula, which I have already derived when I was talking about Bernoulli experiments. So, I can use it actually in this particular case without more detailed description of different combinations, et cetera. I will do it one more time for n is equal to 3, and then I will use the formula for 4, 5 and 6 and whatever. And again, on this graph, I will just use integer 1, 2, 3, 4, 5. I will not divide by numerator because whenever you divide by numerator, the whole graph would be from 0 to 1, basically, right? So, it will look the same, but it will be more difficult to view it in the details. All right. So, let's consider the next case n is equal to 3. And we will see if our graph would be a little bit more bell type. Okay, n is equal to 3. So, let's consider c1 plus c2 plus c3. So, again, what kind of different values my variable eta can get? Well, it's obviously maximum is 3 when all 3 are 1, and minimum is 0 when all 3 are 0, right? And anything in between. So, from 0 to 3, 0, 1, 2 and 3. So, 0, how can we get 0? Well, 0 we can get only in one way. When this and this and this are equal to 0, the probability of each 1 is equal to 0 is 1 half, but we need all 3 of them, so we have to multiply 1 half times 1 half times 1 half. So, 1 over 8. Now, when I'm talking about 1, it means only one of them is equal to 1, and two others are 0s, right? It can be either this one or this one or this one. So, it's 3 different combinations. And, for each combination, the probability is 1 half. So, it's 3 eighths, right? So, it's 3 times 1 half times 1 half times 1 half. So, it's either 0, 0, 1 or 0, 1, 0 or 1, 0, 0. These 3 combinations are delivering the result equal to 1. Next is the value of 2. Now, when can we get 2? When we have 2 1s and 1 0. So, 0 can be either here or here or here, and the other 2 are 1s, right? So, again, we have 3 different combinations. So, again, the probability is this. And, finally, the value of 3, it can get only in one case when all 3 of them are equal to 1. So, it's 1 half times 1 half times 1 half. Probability is 1 eighth. Now, graphically, it would be 1 eighths, 3 eighths, again, 3 eighths and 1. Okay. I don't think this looks more like a bell curve than the previous case. But you understand that right now we are spreading the whole curve, which means that whenever we will squeeze it by the factor of 3 right now to be only from 0 to 1, we will get actually a curve here, so to speak, bell. Let me add one more. M is equal to 4. Okay. And now I will look, I will actually use this formula, c and k times. Now, p to the power of k and q to the power n minus k. But in our case, we have p and q are equal to 1 half. So, this is 1 half and this is 1 half. So, which means we can add the exponents and we will have 1 half to the nth degree, right? Because p and q are both 1 half. So, in our case, n is equal to 4 and I'm looking for value of eta equal to 0. Okay. Now, n is equal to 4, which means in the denominator I have 16. Now, the number of, number c of n by k is obviously 1. So, that's 1, 16. Okay. Number of cases, number of combinations from n by 1, from 4 by 1 actually, well it's 4, right? I hope everybody remembers this formula. So, this is 4, 16. Next is 2. Combinations from 4 by 2. This is 4, 2 and 2. 24. So, it's 6, right? 6, 16. Probability of eta is equal to 3. That's same as 1. That's 4, 16. And probability of eta equal to 4 is, well, 4 should be equal to 1. It's 1, 16. All right. Now, let's draw the graph. So, now I have numbers from 0 to 4. 5 is because the rectangle goes this way. So, for 1, I have 1, 16. For 0, I have 1, 16. For 1, I have 4, 16. So, it's like this. For 2, I have 6, 16. So, it's something like this. Then 4, 16 again and 1, 16 again. Okay. We have more rectangles, but together, you see they are, again, resemble this bell curve a little bit better than the previous case. So, every time I'm adding the number of variables here, my graph resembles in more and more details this bell curve which I was talking about. Next, n is equal to 5. I'm challenging your patience just to demonstrate how closely it comes to normal distribution. So, 5. So, what do we have? We have variables from 0 to 5 and my denominator would be 1, 32. 2 to the fifth degree, right? And I will have 6 here. So, for the value of 0, I will have 1, 32 for the probability. For the probability of 1, I have 5, 32. For the probability of 2, I have 10, 32. Probability of 3 equal to 10, 32. And the probability of 4 is equal to 5, 32. And the probability of 5 is equal to 1, 32. So, 1, 5, 10, 32. So, I will have something like this, then like this, then like this, then like this, then like this, then like this. Okay, I have more rectangles. So, whenever I will average it, which will be divided by 5, they will also be on the same segment from 0 to 1, but more fine distribution and it will be more closely, it's closer resemblance to a bell shape. Yeah. Okay. And the last one, just to try your patience, patience is n is equal to 6 and that would be my nth example. n is equal to 6. So, I have values of the sum from 0 to 6, right? So, all the values in this integer values in this interval are possible. Now, the probability of this equal to 0 is 1 over 2 to the 6th degree, which is 164. Probability to take the value of 1 would be 664. Probability to take the value of 2. So, it's combinations from 6 by 2, it's 6 times 5 divided by 2, it's 15. 64. Probability to have the value equal to 3, which means we have 3 1s and 3 0s. So, 6 from 6, 6 factorial divided by 3 factorial and divided by 3 factorial, this is 20, right? Yes. Yes. 4 is equal to same as 2 because it's symmetrical. 15 is 64. 5 is 664. And 6 is 164. All right. As far as the graph is concerned, distribution of probabilities. Now, I didn't use this term. This distribution of probabilities, which I'm describing right now, sometimes it's called the density of probabilities, but it's more applicable to continuous random variables. That's why I'm trying to avoid this term right now. So, I'm calling it distribution of probabilities among integer numbers basically. So, it looks like this. 1, 2, 3, 4, 5, 6. Well, I will do it approximately. So, this is 164. 1, 2, 3, 4, 5, 6, 7. So, once then I have 6, then I have 15, then I have 20, then I have 15 again, and 6 again, and 1 again. So, again, the more random variables we are averaging, the more bell-like shape it becomes. And if we will introduce some kind of a difference between the real bell shape and this set of rectangles basically, the difference being probably something like an area of these little things. Whenever I will squeeze it to the segment from 0 to 1, when I'm averaging, divide by n, right? So, if I will evaluate this difference between my graphical representation of this distribution of probabilities and the real bell curve, it will be going to 0 basically, as n increases to infinity. And that's basically the purpose of the central limit theorem. The more and more the average of the sum of random variables resemble the bell curve, the normal distribution of probabilities. Now, just think about this for a second. I took something which is completely unrelated to normal distribution. I took the Bernoulli random variables. They just take two values, 0 and 1. But then I start adding them together, mixing them together. And as a result, on average, I am getting closer and closer to the normal distribution. Now, I might or might not, I'm not sure, illustrated in some other distributions like geometric, for instance, distribution and some others. And it will be exactly the same thing. And the meaning of the central limit theorem of the theory of probabilities is that no matter what kind of initial distribution of probabilities you're having, no matter what kind of random variable you start with, if you will sufficiently mix them together, you will add them together, you will eventually add up them to a new, almost normally distributed random variable. And that's why the normal distribution plays such an important role in the theory of probabilities and in statistics. Especially in statistics, when you are making certain experiments on drugs or errors recording or something like this, you're basically mixing together many different factors. Each factor, you don't even know what kind of distribution each factor has. But sufficiently mixed together, maybe from different people, maybe from different experiments, from different timeframes, etc. If you are mixing them together, you can count on the result, on averaging the result of this will be close to normal. Because it's very important for scientists to find out the distribution of probabilities of random variables. And if they're mixing these random variables together, they don't have to guess. All they need to know is basically that this is almost normal. And any normal distribution is characterized by only two parameters, the mean, the expected value expectation and the variance. Now, we can statistically evaluate these two parameters. And that gives you the distribution, the normal distribution with these two parameters. And from there, you can find out what's the probability of my new random variable to be in this particular interval or in that particular interval, whether it's more or less concentrated around your mean, or it's significantly spread around, etc. That actually concludes my today's lecture, which supposed to illustrate that mixing together different random variables under certain conditions results in the random variable, which more and more resembles the normal variable, normal random variable. If the number of initial random variables which we are mixing is increasing to infinity. I do suggest you to read the notes to this lecture on unizord.com. And basically that's it for today. Thank you very much and good luck.