 Hi, I'm Zor. Welcome to Unizor Education. Today's topic is Normal Distribution of Probabilities. This lecture is part of the large force of advanced mathematics for teenagers, which is presented on Unizor.com. And that's where I suggest you to listen to this lecture from, because the website contains not only the video presentation, but also notes, which basically serve like a textbook for you. Alright. Normal probability distribution. Well, I'm in a little bit difficult position right now. To talk about theory of probabilities and not to talk about normal distribution is silly. It's the most important actual probability distribution, which theory of probabilities have. At the same time, this is a continuous distribution. And I kind of had certain restrictions about continuous distribution, because the real mathematics behind it is really within... Well, actually, you need the calculus. Let's put it this way. And I did not address the calculus yet. And even if I will, the volume of whatever the calculus is supposed to be taught for... Well, before you go into universities, is not sufficient to address properly continuous probability distributions. So I will do something like in between. I will try to talk about normal distribution as much as possible in as plain language as possible. But there are certain things which you will just have to take basically on faith value without any proof, for instance, like a formula or whatever else. So that's the way how I would like to approach it. I will try to explain certain qualitative characteristics of normal distribution. And the formula itself I will just mention, but without any kind of a real proof. Okay. Yes, as I mentioned, theory of probabilities, most important distribution of probabilities is normal. And I will basically substantiate the reason why it's called the most important probability distribution. But for now, I would like to basically address it as something which is relatively known to you. I mean, everybody heard about this term, the bell curve, something like this. So what is the bell curve and what kind of a probability distribution is reflected in this graph? Well, here it is. First of all, this graph represents the probability of certain random variable to take certain range of values. Namely, if you want to have the probability of your variable c to be between values a and b, then if you have this curve which represents the probability distribution, the area underneath this curve restricted on the left by the a and the right by b is actually the probability of random variable c to be between these values. Now, in particular, the probability of this variable, this random variable to take any particular value is actually equal to zero. Because the probability of c to be equal to, let's say, c is the probability of c less than c less than c. So it's an area underneath this curve from point c to c, which is actually just one line, right? So it has zero widths and that's why the area is equal to zero. So this is zero, but the probability to be between certain values which are not equal to each other will be something, the area under this curve. So basically that's the sense of giving graphical representation of probabilities. Now, it is a bell-shaped and not any bell-shaped curve represents normal distribution of probabilities, but any normal distribution of probabilities does look like a bell. Now, obviously the area underneath the entire curve from minus infinity to plus infinity should be equal to one, right? Because this represents the probability of random variable c to have any value from minus infinity to plus infinity, which means any. And the probability of the variable to take any values, obviously, that's the full probability should be equal to one. Now, also what's important is, and I did mention it before, the random variable is a continuous random variable. I did specify what is a continuous and what is a discrete probability distributions in the previous lectures. All right. Now, let's just think about this particular graph from the, again, qualitative type of viewpoint. If this particular interval from a to b, let's say it has a certain widths, fixed widths from a to b. If it's close to the center of the bell curve, obviously the area would be greater than if I will put the same widths interval a prime, b prime here. The widths of this interval is exactly the same as this one, but the area is smaller. So, if you're dealing with a bell-shaped distribution of probabilities, the more centrally located the interval you would like to vary the probability to be the greater the probability will be. And obviously it is just straight in the center, symmetrical relatively to the middle line of this graph, that would be the biggest. Okay. Now, another important quality of this bell-shaped is the steepness. You see, I can actually use a different bell-shaped. Let's take some area from the peripheral and concentrate it more in the center. So I will have something like this. Now, under this new bell-shaped curve, I also have the entire area equal to 1. But it's more concentrated, the area is more concentrated around the middle. So the steeper this bell-shape is, the more concentrated the value of the random variable will be. They will be more concentrated around the middle. And obviously the vice-versa is true too. If it's less steeper, if this area is spread a little bit more evenly, something like this, then again the random variable will take the values which are far from the middle with a greater probability. And so the distribution of the values will be a little bit more dispersed, so to speak. Okay. Now, why did I say that the normal distribution is the most important distribution in the theory of probabilities? Well, here is a very important theorem which I'm not going to prove to you because, again, the proof is rather difficult. But the fact that this particular theorem exists, that it can be proven, is really quite amazing. I would probably compare it with the main theorem of algebra, which is actually, if you have a polynomial equation of the power of n, it has exactly n-complex solutions. This is kind of a strange, I would say, theorem. Now, the theorem which I'm going to talk about right now is also strange. I mean, for me it was very strange when I first got acquainted with it. Let's say you have some random variables and they're all independent and they're all having the same identical distribution of probabilities. Let's say you are throwing the dice again and again and again. So every time you throw the dice, you have some value of the random variable. So whatever it is, you have a certain number of random variables. You can always, and let's assume right now for simplicity, that they are independent and identically distributed. Now you would like to do the averaging of them. Well, intuitively, I'm sure you feel that this particular variable, new random variable, is supposed to be behaving in some way the same as each one of those, but it should be more concentrated. Let's say you are measuring certain, let's say, a temperature. Now the temperature is supposed to be the same, more or less, but you are measuring it and you have certain errors when you are measuring. So if the temperature is supposed to be, let's say, 100 degrees and once you measure it and it's 98, then another is 99, another is 102, etc., etc. But if you will average all these measurements, the more measurements you will have, the more narrowly you will be around your 100 degrees. So this is just another random variable which is a little bit more narrowly distributed. Now, what's the expectation of this variable? Well, if these variables, these random variables are identically distributed and independent, then expectation of the sum is equal to sum of their expectation, which is n times, let's say, a, expectation of one particular variable is a. And variance is, let's say, sigma square. Now, what would be the variance? And again, we know that this will be n times greater than the variance of one particular random variable. We actually proven this theorem. Now, when we are dividing it by n, what will be this, let's call it a. Now, the expectation of eta will be, well, obviously, n should be outside as a divisor. So it will be n times a divided by n, it will be a. So the expectation of the average of n random variables would be exactly the same as expectation of any one of them. That's the middle point. And obviously, again, going back to our example of measuring, if every particular measurement has an expectation of 100, then if you will add them up and divide it by n, it will be still 100. Now, speaking about variance, variance is, you remember, this is a quadratic thing. So whenever you are measuring the variance with a multiplier, multiplier goes out of the parentheses with a square. So the multiplier would be over n square times variance of each one of them, and this is n times sigma square, which is sigma square over n. Now, this is also understandable. So the average is more narrowly distributed around the expectation, right, around the mean value. That's why you are averaging, right? You are averaging different errors and different sides, left or right, from the middle value. So whenever you are averaging, it should be closer to the middle value, and that's why, obviously, you have this. But what will be the distribution of this? Okay, we know the expectation, the mean value of this new variable, eta. We know its variance and its standard deviation would be obviously sigma divided by square root of n. But what would be the distribution of probabilities? And here is this main theorem of the theory of probabilities. It's actually called a central limit theorem. The greater the n is, the closer this particular variable is to a normal distribution, to this bell-shaped thing. Why it happens? Hard to say, but people have noticed this. And in the next lecture I'll probably put up a couple of examples, but it's quite an interesting property of the theory of probability that practically doesn't really matter what kind of a distribution these variables have. Whenever you are averaging them, the resulting distribution of the resulting random variable will be as close to the normal as n is growing to infinity. That's why it's called central limit theorem. So as n goes to infinity, then the distribution of the average of random variables is closer and closer to the normal distribution, the bell-shaped thing. So it's an amazing property, quite frankly. The theorem actually is true in a broader situation. Not only when these are independent and identically distributed random variables, there are also a little bit weaker conditions, but this is definitely a sufficient condition for the theorem to be true. And that's probably the most frequently occurring situation in practical life. You have experiments independent one from another and then you're averaging the result and you can count that with a sufficiently large n, the distribution of probabilities of this thing would be relatively normal. Let's put it this way. So it will be relatively close to the bell-shaped. Okay, so this is an amazing theorem called central limit theorem and what I'm going to do next is I will try to put a formula on this bell-shaped curve. Unfortunately, I cannot go into the details and prove this. I will just try to address it from more or less intuitive standpoint. So we have this bell-shaped thing and what I would like to do is I would like to come up with some formula which describes this particular curve. Now the formula which I am going to come up with is actually the formula for this distribution of the probabilities, the normal distribution of probabilities. Are there any other formulas which result in a bell-shaped curve? Yes, of course, but they are not that different so they are not normal distributions. So I will just give you a formula which gives this particular bell-shaped curve and then I will just say, okay, this is a normal distribution without the proof. So the formula I start with is the following. So let's just draw the graph of this function. Now E is the base of the natural logarithm. It's 2.71, etc. It's a transcendent irrational number. When I was talking about exponents and logarithms, I think I mentioned something about this number E. So let's start with this number, with this function and let's try to graph it. Well, obviously at x is equal to 0, it's 0. It's E to the power of 0, which is equal to 1, so this is 1. Now, as x increases to both left and right in absolute value, as x is going further from 0, so x is increasing in absolute value and that's why there is an x-square here. So x-square is always positive, so minus x-square is always negative. So we will have always E to some negative degree. Now negative degree, you remember, is 1 over the positive degree. So this is 1 over E to the x-square, if you wish. So E to the x-square obviously is increasing, so 1 over E over x-square is decreasing. Now E to the x-square, as x is increasing in absolute value, goes to infinity, positive infinity, so 1 over goes to 0. So that's why we have this shape of the graph. Now, we have started with this. And again, there are probably, I mean not probably, definitely there are other formulas which really resemble this bell-shaped. For instance, if I will use E to the minus x to the fourth degree, for instance, it would be probably relatively the same shape. Maybe a little different, but generally speaking it will look the same. But these are not normal distributions. Normal distributions, you start with this formula. Again, without any kind of proof. Next, now is this by itself a normal distribution? The answer is no, because the area underneath, remember it's supposed to be equal to 1, right? Now the area underneath this curve is not 1. It's square root of 2 pi. Don't ask why. Unfortunately, again, I cannot go into the details and proof, etc. But since I know from the calculus that the area underneath this curve is 1 over 2 pi, then a function represents already something which can be the probability distribution. Because it has the area underneath this curve equal to 1. So this area underneath is equal to 1. And that's why we can use, if you have a segment from A to B, then the area between these two values underneath the curve can represent the probability. Okay, now let's go back to our averaging of certain random variables. I would like to have a formula which encompasses not only this particular one distribution of probabilities, which is normal, but a class of distributions. And the class obviously is much wider than just this one. And here is why. For instance, these variables, these random variables, x1, x2, etc., xn, they are identically distributed. And let's say they have some expectation, some mean value, not equal to 0, but equal to whatever. New, for instance. Now, obviously this would be a new random variable with the same expectation, the same mean value. And the values would be concentrated also in the form of the normal distribution, the bell curve. But the center should be at mean, right? So I would like to have not just a one function, but a family of functions, which is driven by certain parameters which can accommodate any averaging of these things. Now, how can I shift the graph to the right by new? Well, very simply. I will do this. You remember that if I subtract constant from the argument, the graph is shifted by that constant to the right. So if new is positive, it goes to the right. If new is negative, it goes to the left. Now, this particular function, which is actually not one function, but a whole family of functions, new is a parameter, describes this particular normal variable, which has an average, which has an expected value, new. So we understand that the limit of this, however normal it is, still does not change the expectation. The expectation should be new. Therefore, our normal variables should be broad enough to accommodate any kind of average value, any kind of mean value. So that's why I have introduced the parameter new. But that's not everything yet. There is another parameter which is very important. Remember the steepness? So you can have another curve like this one. It's steeper in the middle. So it represents actually the variance of the components. If these components have smaller variance, then their average also will be smaller. So I have to introduce this other parameter. Now, you remember that if you have certain graph of the function f of x equals to whatever. Let's say in this particular case, for instance, e minus x square. If I will introduce another function, g of x equals to e to the minus x divided by b. So if I divide the argument by certain number, what happens in this case? Well, you probably remember from the lectures dedicated to the graphs of the function, that the function actually gets either multiplied or divided by b. In this case, it's multiplied, so it goes a little bit further. So this is, so as much as subtracting the value shifts the whole graph, the dividing the argument is multiplying, it just squeezes the graph along the horizontal axis. So in this particular case, when I divide it by b for instance, if I divide this for instance by b, my graph would be along the x, it will be squeezed or factored out or in by this factor b. So in this particular case, for instance, if I divide it by b, then the graph should be stretched. And if I will multiply it by b, the graph would be squeezed. And that actually is exactly what steepness means in this particular case. So if I will squeeze for instance the graph or if I will stretch it, that exactly is a change in the steepness. So it would be a little bit more, how should I say it, less steep or vice versa, it should be more steep. Now with a proper multiplier, I can make this also the graph which has an area equal to one underneath. So this will also represent the probability distribution. And this is another parameter which I would like to introduce. Two parameters, one is horizontal shift to accommodate any kind of a mean of our original random variables which I am averaging. And another parameter which is basically the parameter which represents the steepness of this bell around the middle would represent different variants of our random variables. And the formula, the final formula which I am going to give to you right now is the one which contains both, sigma here and two sigma square here. Or if you wish we can do it differently. X minus mu divided by sigma square root of two square. The same thing, right? So this formula represents the bell curve which has two parameters. Parameter mu shifts the bell curve along the horizontal axis to whatever the mean value of each component of this sum is. So mu is actually the mean of this variable, random variable eta. And this multiplier actually is characterizing the steepness of the curve around the middle point. Now what's important is that this distribution and now I have to use another parameter here. So it's phi from x mu and sigma. Now what's important is that this represents the probability distribution of a normally distributed random variable which has mean value of mu and variance sigma square or standard deviation sigma. And that's why I use this sigma square root of two because I can use different multipliers obviously but it would not be the distribution of the probabilities. So we need the area under the curve to be equal to one. So this is the final formula which to tell you the truth I never remember. The only thing which I do remember is the general thing u minus x squared. That I do remember but these multipliers unfortunately I don't always have to look at the textbook. But in any case it represents the bell curve which is actually a graph of a normal distribution, normal probability distributions. That's very important and as I said the importance of this normal random variable and its distribution obviously is such that it represents a limit case for averaging other random variables. And since other random variables are always representing like measurements for instance or results of some experiment or whatever else, the averaging is really very very important tool for any kind of scientific research etc. You are measuring something many many times and then you are averaging the results. And this averaging of the results can be modeled as a normal random variable, normally distributed random variable. Well that was basically it, that was all I wanted to say. This is kind of an introduction into normal random variables. In the next lecture I will try to put a couple of examples. So you will see that this particular averaging does actually resemble more and more normal distribution, the bell curve with the number of random variables participating in this averaging increasing. Alright, thanks very much and good luck.