 Okay, in today's lecture we will be looking at the normal distribution. It is also called as the Gaussian distribution. The reference books for this topic are given in the slide. The book by Montgomery and Runger applied statistics and probability for engineers and the book Random Phenomena by Oganayake are suitable references for the material we are going to cover in today's lecture. So why should we study normal probability distribution? It is very popular, very elegant and relatively simple. It has some nice properties to it. We use normal probability distribution not only because of these desirable features but it is also the distribution which real life tends to follow. In many cases, in large classes the distribution of marks is considered or approximated to be a Gaussian or normal distribution. If you look at the particle sizes coming from a crusher or a grinder, they may cover a very large range of values. The smallest particle may be in the micron range and the largest particle may be in the millimeter or centimeter range. These sizes, let us denote them by D. You take the natural log of the particle sizes convert them into ln D. You will be surprised to find that the distribution of the natural logarithm of the particle diameters is following the normal probability distribution. Once you have this normal probability distribution, you can do a lot of things. In the case of the marks distribution, you can find the percentage of students who have got marks, let us say between 50 to 60. You can also find out what is the percentage of students who have got marks below 20. Regarding the particle sizes, you can also calculate the probabilities that the particle sizes are going to lie between 2 values. It is a very useful distribution. It is also going to be useful for our design of experiments and analysis of data because many of the samples, they have properties like mean and variance and so on. If you look at the mean values of different samples, sometimes they follow the normal probability distribution. Some of the other standard distributions like the t-distribution also tend to normal distribution under certain conditions. This probability density function finds several applications in science and engineering. It is one of the most widely used continuous probability distribution functions in statistical analysis. When we are looking at the parameters of the normal probability distribution, it is very interesting to note that the parameters of the distribution are themselves the mean and the standard deviation of the distribution. This is a big advantage. In many other distributions, you will have 2 parameters or in more infrequent situations, you may have even 3 parameters. But in the case of a normal distribution, you have 2 parameters. The 2 parameters describe the shape of the distribution and the parameters are themselves the mean and standard deviation. For other distributions, you have some parameters. Let us say we will call them as parameter 1 and parameter 2. These 2 parameters are then used in a mathematical expression to get the mean and variance. Let us look at the probability density function for the normal distribution. The function is denoted by f of x and that is given by 1 by root 2 pi sigma squared exponential minus x minus mu whole squared by 2 sigma squared. Mu is the mean of the distribution. Sigma squared is the variance of the distribution. The lower limit is minus infinity and upper limit is plus infinity. So, x can take both negative values as well as positive values. Now, we will define the normal random variable. A random variable x with the above probability density function is a normal random variable with parameters mu and sigma. The parameter values may range from minus infinity to plus infinity for the case of the mean and the case of the standard deviation, it may vary between 0 to infinity. This is an important point. Many of us intuitively believe the mean value of a distribution should be only positive. It need not be the case. Mean is an average okay and so the distribution range may be such that the mean value or the average value may be negative. For example, if the upper and lower limits of the distribution are minus 5 to minus 50, then the mean would be somewhere between minus 5 and minus 50. So, the mean can take negative values. However, the standard deviation is obtained as the positive square root of the variance and so it is always going to be a positive quantity. Coming back to the normal distribution. For a distribution with parameters mu and sigma squared, we use the general notation n of mu sigma squared to denote the normal distribution. As any other probability density function, the normal distribution should satisfy the condition minus infinity to plus infinity f of x dx is equal to 1. In other words, if you take this expression for f of x into the integral and carry out the integration, fortunately the integration can be carried out analytically. You will find that the value is equal to 1 after the application of the limits. This is not only the definition for the mean mu, but you can also show that after substituting the value for f of x, the equation I just showed, thus this equation, if you plug it here and then multiply by x, carry out the integration, you will be pleasantly surprised to find that value to be mu. The function had parameters mu and sigma squared and you finally end up with mu. Sigma squared surprisingly will vanish in the mathematical manipulations involved. When you carry out a different type of integration between minus infinity to plus infinity x minus mu whole squared f of x dx, you will find after plugging in the expression for the normal distribution here, you carry out the necessary mathematical steps including integration by parts. You will find that the mu will somehow vanish and you are going to be left with only sigma squared. The area under the curve is equal to 1. The total area under the curve is equal to 1 when you integrate f of x between the lower limit to the upper limit. What you do is you plug in the equation of the normal distribution here for f of x. After carrying out the integration, you will get the area to be 1. The normal distribution is a very flexible one. It can change shape very easily when you change the parameters. If you look at the normal distribution, it is centered at the mean value. So, by changing the mean value, you can make it move around. Instead of 0, if you say the mean to be 50, the normal distribution will shift and it will come to be centered around, okay, modification here. The normal distribution is a very flexible distribution. It can change shape and location pretty easily. The parameters of the distribution are mu and sigma. The distribution is centered around the mean mu. It is a symmetric distribution. When you change the value of mu from let us say 0 to 50, the distribution moves to the right and it gets centered at 50. The spread of the distribution is governed by the parameter sigma or the standard deviation. If you look at these 3 curves, all of them are normal distributions. But they have different standard deviations. They have the same mean mu of 0 but they have different standard deviations. The one showing a taller peak has a smaller standard deviation of 10. The second peak of intermediate height has an intermediate standard deviation of 20. And the shortest peak but the widest one as well has the standard deviation of 30. So, the standard deviation is a measure of the spread and so higher the standard deviation, the more would be the spread of the distribution. You can make the normal distribution move about by changing the value of mu. So, instead of 0, you can give 10 or 50 and it can just shift to the side. You can also try to imagine what would happen if you continuously decrease the value of sigma. If the value of sigma is reduced, the height of the distribution will increase. What will happen when you reduce the sigma value to 0? Just think about it. What is that mathematical function called? Here we have cases where we reduce the standard deviations to smaller and smaller values. Here you start with 3, the pink curve, then you reduce it to 2 and then you reduce it even further to 1. The green curve shows the tallest peak. And if you keep reducing the value of sigma, the f of x value will keep increasing. f of x is a maximum at the center. So, this peak value will keep increasing as the standard deviation decreases. The normal distribution is symmetric about the mean. If the mean is 0 and then you cover from the mean value to the upper limit 0 to infinity, you get f of x dx is equal to 0.5. Here we have the mean of 0. When you cover the curve from 0 to infinity along this direction, the area under the curve will be 0.5. We know that the total area under the curve is equal to 1. So, when you consider half the domain from 0 to plus infinity, we should get an area of 0.5. In case you have a non-zero mean, then when you integrate between mu to infinity, we get f of x dx mu to infinity is equal to 0.5. So, you can have different normal distributions, each one having a different mean and or variance. I told you that we do not have to do numerical integration or any further calculations to find the different probabilities involving the normal distribution. However, you can have different normal distributions, each one having different mean and or standard deviation. So, for each normal distribution, we cannot have a chart or a table of probabilities. It is important to reduce a given normal distribution into a standard form. The transformation of any given normal distribution to its standard form is pretty easy. Once it has been reduced to a standard form, then you need only one set of tables or charts to read the probability values. How do we define the standard form of the normal distribution? The normal distribution in its standard form has a mean of 0 and variance of 1. Since variance is 1, standard deviation is also 1. So, a standard normal distribution has mean 0 and variance 1. The random variable associated with the standard normal distribution is called as the standard normal random variable. The cumulative distribution function of a standard normal random variable is denoted as phi of z is equal to probability of z less than or equal to z. We are now talking about a standard normal variable and what is the probability that this standard normal variable will take a value smaller than that of z and that is given by the cumulative distribution for a standard normal variable phi of z. How do we make the transformation? You are having the original random variable x which is following the normal distribution. It is having a mean mu and standard deviation sigma. You have to convert it into the standard normal random variable form. For that, we do z is equal to x minus mu by sigma. x is the original random variable. Mu is the original mean of the normal distribution and sigma was the original standard deviation of the normal distribution. By making this transformation, we subtract mu from x and then the resulting quantity is divided by sigma. We get z. After this transformation, we have a new random variable z which has a mean of 0 and standard deviation of 1 and it is also a normal distribution. So, this transformation is applicable irrespective of the value of mu and sigma. Obviously, you cannot have sigma to be 0. It should be a positive number greater than 0. Whether it is negative mu or positive mu, it is immaterial. All you have to do is carry out the transformation given by z is equal to x minus mu by sigma. What does it really mean? When you want to find the probability of z less than or equal to z, where z is a number. The small z is a number taken by the random variable capital Z. But we do not know the value of small z initially. We only know the value taken by the random variable capital X. To find the small z, you substitute the value of small x here. Then you subtract the mean of that normal distribution. That resulting quantity you divide by the standard deviation of the normal distribution. Then you will get small z. You are having a normal distribution associated with x, the random variable x. It was having mu and sigma as its parameters. Now, when the random variable x takes on a particular value small x, you subtract the actual mean of the normal distribution from that small x divided by sigma. Then you will get a value small z. This small z is used here. Then you use the standard normal random variable form. So, it becomes probability of z less than or equal to small z. Now, you can use the normal distribution with mean mu of 0 and standard deviation sigma of 1. Charts are available for this particular normal distribution. That is the normal distribution with 0 mean and unity standard deviation. Then you can compute the probabilities. Standard normal probability tables are available in many places including the reference book I told you at the beginning of the lecture. You can also find these tables in the internet sources. Surprisingly or interestingly, you can generate these tables yourself if you have access to any standard spreadsheets. You can define values of z and use the appropriate command in the spreadsheet and generate the complete table. I have done that. So, you can see this table. The z value is starting from minus 3.9 and it is going in the vertical direction towards increasing z values. Suppose I want to find the probability corresponding to minus 3.75. So, I look at minus 3.7 here. Then I go horizontally to my right and hit the value corresponding to minus 3.75. So, if I go in the horizontal direction, I get z values of minus 3.99, minus 3.98, minus 3.97 so on to minus 3.90. If I take z value here and I move in the horizontal direction, I get minus 3.19, minus 3.18, minus 3.17 and so on. So, I can read the corresponding probability values. Why we are not having values below minus 3.9 is the probability values or the area under the curve corresponding to minus 3.99 is pretty small 10 power minus 5, 3.3 10 power minus 5 which is pretty much close to 0. So, if you go for z values even lower than this, you will get even smaller numbers. So, when you are pretty much reporting 0, it is not really necessary to report 10 power minus 6, 10 power minus 7 and so on. Even though our range is from minus infinity to plus infinity, we see that the curve pretty much coincides with the x axis at value of minus 4. Since the distribution is symmetric, the probability values or the area under the curve beyond z is equal to plus 4 will also be very, very small. So, you can take any z value up to 2 digits beyond the decimal point minus 3.56, you just take minus 3.5 or locate minus 3.5 here, go towards your right hand side and you will hit minus 3.56, minus 3.55 and so on. Suppose you want to find the probability minus 3.555, that is 3 decimal points, you locate minus 3.5 here, go to your right, you will see that minus 3.555 lies between minus 3.55 and minus 3.56. So, you may want to interpolate between these 2 values. Even up to 2 decimal places, the chart is pretty useful but suppose you want to find the beyond 2 decimal places, then you have to do some interpolation. The values are likely to become slightly erroneous at the 3rd decimal or 4th decimal which is okay for most practical purposes. If you want very accurate values of probabilities, then you have to resort to spreadsheet or any statistical analysis software. So, I have covered the broad range. So, we have in this particular table from minus 2.9 to minus 2 and so on until I hit 0. And from 0, again I start from 0.1, 0.2, so on. I want to find the normal probability value corresponding to let us say 0.44. I hit 0.4 here, then go horizontally to my right until I reach 0.44. I read out the probability 0.67. This means that probability of the random variable lying taking values below 0.44 is 0.67. Since I have crossed the origin, I have crossed the area under the curve of 0.5 and so the values would be higher than 0.5. That Z is equal to 0, you see the probability value is 0.5. Probability of the random variable Z taking values below 0 that is minus infinity to 0 is equal to 0.5. You are describing the left half portion of the curve. So, even at 1.09 you are covering up to 86% of the area under the curve. The probability of the random variable, the standard normal random variable capital Z taking a value 1.09 or lower is 0.86. If you go further down, you see that you have reached 0.98 at Z value of 2.09. Now we come to the log normal random variable. I told you at the beginning of the lecture that even if the original random variable X is not following the normal distribution, in some cases if you make a simple transformation from X to ln X, then it starts to behave in a normal fashion. I am not implying that the random variable X, the original random variable X was behaving abnormally previously and once it was converted to ln X, it starts behaving normally. I really mean that it was following some other probability distribution in its original form when it was going by random variable X. But once you have converted it to ln X, the probability distribution is the normal distribution. It is not going to happen for all cases. So, you have to be a bit careful here. You make the conversion from X to ln X and see whether the distribution has a bell shaped curve or a normal like curve. So, the next thing is let us define the random variable Q as ln of X. Please remember that when you subject a random variable to a mathematical transformation of any kind, the transformed variable is also a random variable. For example, you have X and then you convert it from X to X plus 2 by adding 2. You define a new random variable as X plus 2. This X plus 2, let us call it as Y. Y is also considered to be a random variable. Similarly, when you have a random variable X, you subject it into a transformation and make it into ln X, then the random variable Q which is equal to ln X also has a probability distribution. When the random variable X has been converted to ln X, the range would be from minus infinity to plus infinity. However, when the random variable was in the original primitive form X, the range must be only from 0 to plus infinity. Only positive values are allowed. The reason is if the random variable X had a negative value, when you take ln of a negative value, it is undefined. So, you cannot really have X values that are negative. This is not a restriction. In many physical cases, you have only positive values that are possible for the random variable. For example, particle size distributions, you can have very small particle sizes of 10 power minus 3, 10 power minus 4 meters and so on in the micron range or sub micron range or even the nano range, but they are all positive. When you take the ln of number which is less than 1, then it will become negative and so in the transformed domain, the range can be from minus infinity to plus infinity. So, if you look at this particular slide, the transformed variable, the transformed random variable Q may take values between minus infinity less than Q less than infinity. The primitive random variable X may take values only between 0 to infinity. It cannot take a value lower than 0 because ln of a negative number is not defined. Let us now look at the form of the log normal distribution. It is not something new. Since it is going to follow the normal distribution, we have f of Q is equal to 1 by root 2 pi beta squared exponential minus Q minus alpha whole squared by 2 beta squared. The value is minus infinity as the lower limit and plus infinity as the upper limit. The parameters are alpha and beta. We use Q here. Q was obtained by taking the natural logarithm of X. This is an important point. Please do not put X here. You have to convert it into Q by taking the natural logarithm of X, then use Q here. Then you can use the normal distribution characteristics. So, you have the random variable X and you also have the transformed variable Q. So, after the transformation, you got a normal distribution. What was the distribution like in the original form? How was the probability distribution defined in terms of the original random variable X? That is very interesting. We can find it pretty easily. We know that Q is equal to ln X. So, we differentiate Q with respect to X. We get dQ by dx is equal to 1 by X or dQ is equal to dx by X. This we can use to retrieve the original form of the distribution in terms of X. The cumulative distribution function for the log normal random variable is given by f of p is equal to minus infinity to p 1 by root 2 pi beta square exponential minus Q minus alpha whole square by 2 beta square dQ. This is nothing but the cumulative distribution function. We have already seen this in one of our earlier slides. So, to find the value of the probability less than or equal to p, we do the integration up to p. Now, we can use this to find the original form of the probability distribution expressed in terms of X. Here we are using ln X or Q. Here we put instead of Q, we put ln X. Instead of dQ, we use a substitution dx by X. When we convert Q into X, we have to make sure that the limits are also appropriately changed. When we do that, we see that the lower limit of Q which was minus infinity has become 0 when we converted it into the X domain and the value p became e power p when you converted it into the X domain and instead of Q, we have put ln X and instead of dQ, we put dx by X and this represents the cumulative distribution function in terms of the original random variable X. So, the probability density function is given by 1 by X into 1 by root 2 pi beta square exponential minus ln X minus alpha whole square by 2 beta square and the value of X is between 0 to infinity. This is very interesting. Here, this f of X looks quite similar to the normal distribution but note that instead of X, we have used ln X and there is an additional 1 by X term here. Even though the differences are seemingly slight, they are quite significant. Important to note here in this distribution, alpha and beta are not the mean and variance or rather standard deviation of this distribution. For the normal distribution, mu and sigma actually represented the mean and standard deviation but for this distribution, the log normal distribution alpha and beta do not represent the mean and standard deviation. This is something which we have to remember. This is the cumulative distribution function. By now, it should be familiar to you f of X is equal to probability of X less than or equal to X and that may be written as probability of Q less than or equal to ln of X. How did you get this? You have to just take ln of X, ln of small x and so ln of capital X became Q, Q is less than or equal to ln X. To find the probabilities, we have to convert them into the standard normal form. That is pretty easy. What we do is, we subtract alpha from ln X and divide by beta. Please note that alpha and beta are the mean and standard deviation of the distribution. In the transformed case, you converted X to ln X and that started behaving in a normal fashion and so the parameters alpha and beta did represent the mean and standard deviation of that normal distribution provided the transformation X to ln X had taken place. This is very important and then you can treat it as a normal distribution as like any other normal distribution and convert it into the standard form. But if you are not using ln X but you are using X directly, then you will have to use this form of the probability density function and there alpha and beta cannot be interpreted as mean and standard deviation. So you have the standard form, ln X-alpha by beta, probability of Z less than or equal to ln of X-alpha by beta may be represented as the cumulative distribution function phi of ln of X-alpha by beta. I told you that in the original form, you may recall me telling you that alpha and beta are not the mean and standard deviation of the log normal distribution. So how do we find the mean and standard deviation of the log normal distribution? It is quite simple. We use the parameters alpha and beta. The expected value of X is equal to mu and that is given in terms of e power alpha plus beta squared by 2 and the variance of X is sigma squared okay and that is given in terms of e power 2 alpha plus beta squared into e power beta squared minus 1. So to find the mean, use this formula to find the variance, use this formula. So we have covered two important distributions in this lecture. The first one was the normal distribution and the next one was the slightly confusing log normal distribution but after solving a few problems, you will have no such confusion. We will take some illustrative problems and solve them using the normal probability tables and the concepts will become clear.