 Welcome back. As I said earlier, we will be now looking at continuous probability density functions. The random variable x may either be discrete or continuous. It may take individual discrete values or it may take values within a continuous range, okay. What I mean by a continuous range is even decimal numbers can be included. For example, mole fraction lies between 0 and 1 but it can take any value between 0 and 1.53, 0.54 and so on. Whereas if you roll a die, you can have only discrete entities like 1, 2, 3, 4, 5 and 6. So it may take real numbers which may range between minus infinity to plus infinity. One of the important property of a random variable whether it is continuous or discrete is it has a probability distribution function associated with it. The probability distribution function simply means the probability associated with the random variable taking on a particular value or falling within a range of values. In the case of discrete probability distributions, the probability of a random variable taking a particular value may be given. In the case of continuous random variables, you do not talk of the probability of the random variable taking a particular value but you talk of the random variable taking a value between a certain range, okay. We will be discussing more on this shortly. Whenever there is a chemical reaction taking place in a reactor, the products of the reaction are measured by using several analytical means. For example, you may use a gas chromatograph or a HPLC or you may even use simple titration methods to find the concentration of the products. And so the values can be practically continuous. Usually the values beyond 3 digits after the decimal point are not reported. You give only up to 2 to 3 digits depending upon the accuracy of your instrument. Another example of the continuous random variables is the case or situation involving the heating of water kept outside in the sun. Let us take a bottle of water from the fridge. It may be around 10 or 15 degree centigrade and then we keep it out in the sun. The temperature of the water will increase and if you are measuring the temperature with a thermocouple, it can show values that are anywhere between let us say 15 degree centigrade to let us say 35 or 40 degree centigrade, okay. And these values are continuous. The thermocouple depending upon the accuracy may be reporting up to 2 decimal places. So the point here is we are not having discrete temperature values but the temperature values varying over the range between 15 degree centigrade to say 35 or 40 degree centigrade. We cannot talk of the probability of a random variable taking a particular value x in the case of continuous probability distributions, okay. Here the probability of the random variable taking a specific value within the range is actually 0. I am going to give the reason for it. Let us consider a conical block of wood which is having different colored sections. You can see that it is terminating in a point and then expanding to the base. Now if you want to find the weight of the wood at the point, okay, obviously the weight of the wood at the point is 0 but if you take a certain section of the wood, let us say covered by the green portion, it will have a certain weight. If you add up all the colored portions, you are going to get the total weight of the wood. So we want to find the mass of the wood starting from the pointed edge x is equal to 0 and eventually moving towards the flat base x is equal to L. Since the mass is equal to volume times density, when the length is equal to 0, the volume is also equal to 0 and hence the mass will be 0. The green portion however occupies a certain length and it occupies a certain volume and so it will have a certain mass. Let us take any distance from the pointed edge, okay. Let the distance be horizontal and let it be denoted by A. The mass of the conical block of wood until the point A is given by equals rho into integral of 0 to A pi r squared dx, okay. The need for integration arises because r is changing with x. The radius of the conical block of wood is varying with distance. So you would like to do the integration and find the mass. A can be anywhere between the conical tip to the base. In order to carry out this integration, you need to find out the relation between the radius and x, the distance from the tip, okay and that can be easily found. The important thing to note here is the weights are continuously distributed over a length x and the weight of the object up to a point A may be found by integrating the weight distribution function up to that point. The same concept we can use to describe the probability density function. It describes the distribution of probabilities in the continuous random variable domain, okay. So the probability that the random variable x will take a particular value within that range is actually 0 just as the weight of the wood at a particular point is 0. So in order to find the weight of the wood, we have to take a certain portion of the wood. In the same way, in continuous probability distributions, we want to find the probability of the random variable x falling between 2 values within the range, okay. You can even start from the lower limit of the continuous random variable range up to a certain point in the same range. You will have a non-zero probability. We may interpret a probability density function f of x as 1 that assigns probabilities in such a manner that probability of x less than or equal to a equals probability of x less than a is given by minus infinity to a f of x dx. So we are doing the integration here and if you look at the slide, it can be seen that we have probability of x less than or equal to a or probability of x less than a. It is immaterial whether we put x less than or equal to a or x less than a. Since the probability of the random variable taking a particular value, okay a specific value 0, it is immaterial whether we use the less than or equal to sign or less than sign. We have the probability density function or the probability distribution function f of x. We then integrate it between the lower limit to the required value a. In many practical situations, the random variable may not take a value of minus infinity. For example, if you are crushing the rock, the smallest particle size may be very, very small in the order of let us say 1 micrometer, okay. So definitely it is not minus infinity, it is not even negative. But in some cases we take the logarithm of such values and so it may become negative, right. The upper limit also need not be plus infinity, it may be a finite value. So if you are having a finite lower limit and a finite upper limit, the random variables value beyond these limits will be definitely 0, okay. It will take only values within the specified interval. It is assumed that beyond this interval, the random variable value is going to be 0. So again coming to the definition of the probability density function, probability of the random variable taking a value less than or equal to z. So we put that z in the upper limit of the integral sign and then we call it as the f of z, okay. This f of z is also a familiar quantity to us. It is nothing but the cumulative distribution function for the continuous random variable x, okay. It is as if you are accumulating or aggregating the probabilities up to z and so it is called as the cumulative distribution function. Earlier we were using the sigma sign for adding up the probabilities. Now we are using the integral sign. This is an example of a continuous probability density function. You can see that beyond minus 100 and plus 100, there are literally no points, okay. There are no representation of the data which means that your lower limit is minus 100 and the upper limit is plus 100. Even these values up to 100 on either side of the origin are pretty small but once you reach, let us say about 35 or so, the values start to increase and they reach a maximum value at the origin. This probability density function is symmetric in nature because the area under the curve on either side of the origin is the same. For example, if I take 0 to minus 10, I find the area under the curve. That will be same as the area under the curve from 0 to plus 10. This is an example of a symmetric distribution but not all distributions need to be symmetric. Some of the distributions may have skewness. They may be having an orientation towards the lower values or a preference for lower values and so they may be having a peak earlier and then they may have a long tail. Let us look at some of the features of probability density functions. The area under the curve is equal to 1. If you recall the discussion for discrete probability distributions, sigma f of xi was equal to 1 which means that if you add up all the random variables probabilities, then they should add up to 1. Similarly, when you take the area under the curve which represents the probability distribution, you should get 1. So, essentially we are saying what is the probability of the occurrence of the values of x between the upper and lower limit. We are including all the values between the upper and lower limit and the probability will be equal to 1. So, we are really more interested in finding out the probabilities of the random variable lying between 2 values or 2 numbers. Here we are more interested in a specific value a, a specific value b which do not correspond to the lower limit or upper limit, okay. We can sometimes relax the restriction. We can even put a as the lower limit or b as the upper limit but both of them may not be the limiting values. If your range is between let us say 0 to 10, okay, we may be interested in finding the probability of the random variable taking a value between 2 and 4 or any value, okay. So, in such a case the answer is not equal to 1, it will be a value lower than 1. For that we need to do the integration. We take the limit a to b for the integration and then integrate f of x between these 2 limits. This may be written as – infinity to b f of x dx – – infinity to a f of x dx. That will give us the area under the curve between a and b and that is nothing but the difference between the 2 cumulative distribution functions. The cumulative distribution function evaluated at b and the cumulative distribution function evaluated at a. So, we get f of b – f of a. Suppose the 2 numbers a and b are such that they are very close to each other. Let us say that one values z and the other values z plus delta z. We want to find the probability between z and z plus delta z. So, what we do is we integrate between z and z plus delta z f of x dx. If delta z is very small, okay, then we can write probability of z less than or equal to x less than or equal to z plus delta z as f of z times delta z. In the small range of z to z plus delta z, f of x takes the value f of z and it does not change significantly in this small interval. So, we may as well write it as f of z and since it is constant between these 2 values or pretty much constant between these 2 values, we may take it outside the integral sign and then simply evaluate dx between z and z plus delta z and we get f of z delta z. Please remember this result. We will be using it shortly. Now, we want to find the relation between the probability density function and the cumulative probability distribution function. In other words, we have to find the relation between small f of x and capital f of x. So, the definition for the cumulative distribution function corresponding to z plus delta z that is probability of x less than or equal to z plus delta z is equal to minus infinity to z plus delta z f of x dx. This is nothing but f of z plus delta z. So, we can write it as f of z plus delta z equals minus infinity to z plus delta z f of x dx. We also know on the same lines that f of z equals minus infinity to z f of x dx. Hence the probability of the random variable x lying between z and z plus delta z may be written as f of z plus delta z minus f of z. The last 2 f's are capital F's representing the cumulative probability distribution. However, just a brief while ago we saw that probability of z less than or equal to x less than or equal to z plus delta z was more or less equal to f of z delta z. So, we may write f of z delta z as the difference between the 2 cumulative probability distributions. So, f of z delta z is equal to f of z plus delta z minus f of z. We can now write after dividing by delta z f of z is equal to limit of delta z tending to 0 f of z plus delta z minus f of z by delta z and that is equal to f of z is equal to df of z by dz. We are now taking the derivative of the cumulative distribution function with respect to z. Then we retrieve or get the continuous probability density function. It is quite simple in fact, you please recall that when we wanted to get the cumulative distribution function we carried out the integration. Now, when you want to get the probability density function from the cumulative distribution function we have to carry out the differentiation. We differentiate the cumulative distribution function f of z with respect to z to get the small f of z. Changing the variable from z to x we get f of x is equal to d capital f of x by dx. So, when you are given the cumulative distribution function you can obtain the probability distribution function or the probability density function by simple differentiation. When the cumulative distribution function is given in the form of equation, you can differentiate this equation with respect to the independent variable and retrieve the continuous probability density function. You can retrieve the continuous probability density function. Sometimes you are going to get only experimental data. You may not have a mathematical model to that equation. Then you may have to carry out numerical differentiation to get the form or shape of the probability density function. These are simple elementary calculus. A certain query may arise when you are handling the probability density function. We know the area under the curve is equal to 1. Integral of f of x dx is equal to 1 but can the f of x value itself be greater than unity. So, that is a very interesting question. If the area under the curve of the function x is equal to 1, can the value of f of x exceed unity? There is no restriction on the value of f of x provided it is real. It should be also positive value. And beyond this there is no real restriction on the range. You should also note that the constraint integral of a to b f of x dx is equal to 1 should be satisfied. So, if f of x value takes greater than 1 numbers, then the upper and lower limits should adjust accordingly so that this constraint is satisfied. Let us take a simple case where the function is given by Mx. So, when you integrate Mx between the limits a to b, you get Mx squared by 2 or M by 2 into b squared minus a squared and since that will be equal to 1, M by 2 into b squared minus a squared is equal to 1, M will be equal to 2 by b squared minus a squared. Then the value of M increases okay, the b will decrease and approach a so that the constraint of the area under the curve being equal to unity is satisfied. To demonstrate this with respect to this figure, if the value of f of x keeps increasing or goes to a higher value, then the width will start to shrink and the lower limit and the upper limit will start approaching each other. So, when the curve tries to grow upwards, it sort of becomes thinner and thinner okay, the limits approach each other. In the discrete probability distribution analysis, we came across the mean mu and the variance sigma squared. How are these parameters defined for the case of the continuous random variable? In fact, the definitions are quite similar. We use integration sign here while we use the summation sign for the discrete case. Let us look at the mean of the probability density function. Mu is equal to the expected value of the random variable x okay. So, we have mu is equal to minus infinity to plus infinity x f of x dx. This is the definition for the mean or the average value for the continuous distribution. We can also define the variance of the probability density function. The variance is defined as the expected value of x minus mu whole squared. Here x is the random variable. For continuous distributions, we have sigma squared is equal to minus infinity to plus infinity x minus mu whole squared f of x dx. We can expand the x minus mu whole squared and try to simplify this integral. So, we get x squared minus 2 mu x plus mu squared f of x dx and after a bit of simple manipulations, we get sigma squared is equal to minus infinity to plus infinity x squared f of x dx minus mu squared. What happened here is we multiply f of x with x squared and take it out. We have a separate integral minus infinity to plus infinity x squared f of x dx minus 2 mu x f of x dx became minus 2 mu squared using the definition for the mean and then you have mu squared f of x dx. When you integrate that minus infinity to plus infinity f of x dx is equal to 1. So, you get minus 2 mu squared plus mu squared which reduces to minus mu squared. So, sigma squared the variance of the probability density function is equal to minus infinity to plus infinity x squared f of x dx minus mu squared. The variance is an estimate of the spread of the distribution. To find the standard deviation, you take the square root of the variance and report the positive value. By now, you may have also noticed that sigma squared plus mu squared is expected value of x squared. This result we have also seen in the case of discrete probability distributions. E of x squared by definition is minus infinity to plus infinity x squared f of x dx. These are parameters which we will be frequently coming across in continuous probability distributions. Some of you may be having the apprehension perhaps that every time you may have to do the integration to find the mean and variance. Usually, we will be working with standard probability distributions for which the mean and variance are well known. So, we do not really have to do the integration every time. The next query is suppose I want to find the probability of the random variable x lying between 2 numbers, okay. Should I do the integration? For standard probability distributions, tables of data are available for finding out these probabilities. In case the function is unknown or a new function which has not been studied previously, then of course, you will have to carry out the integration. If the integration is not possible analytically, you may have to carry out the integration numerically. Nowadays, the softwares are so advanced that either carrying out the numerical integration or even the analytical integration is not a big task. If you look at standard software spreadsheets, numerical computation programs, MATLAB, they all can directly give you the value of the probability when you specify the limits of the probability distribution function. The probability values are automatically calculated and given. I would encourage you to look at some of these spreadsheets and mathematical software to find out the probabilities for some standard distributions. I would like to encourage you to look at these spreadsheets or mathematical software. You can find out using the help command how to find the probabilities for standard distributions. They are very easily calculated through simple commands. The standard probability distributions which we will be studying are the normal distribution of the Gaussian distribution, the student's t distribution, the Fisher distribution and the chi-squared distribution. There are also other distributions like the beta distribution, the gamma distribution, the Weibull distributions and so on. But once you have had the background for these standard distributions which I am going to talk about, you can very easily understand the other ones. So right now we will conclude our discussion on the continuous probability distributions and go on to the most important distribution, the normal distribution.