 So, continuous random variables were already introduced and we will primarily concentrate on the probability distributions of some of such continuous random variables. Recall that a random variable x is continuous if its set of possible values is an entire interval of numbers that is if a is less than b then any number x between a and b is possible. This is unlike the discrete random variable which is discrete in nature. As a simple example of a continuous random variable you can consider the say chemical compound if you select a chemical compound randomly and look at its pH value. So, the pH value x can take values between 0 and 14. So, that is a typical example of a continuous random variable looking at looking at the probability distribution if we have x as a continuous random variable then a probability distribution or probability density function of the random variable x is a function f x such that for any two numbers a and b the probability of x lying between a and b is the integral of f x in the range a to b and the graph of this function f is called the density curve of the random variable x. Now, for f x to be a probability density function it should satisfy certain properties. The first property being that the function f x is positive for all values of x and secondly the area of the region between the graph of f and the x axis is equal to 1 as you can see in the picture. So, this basically means that the integral f x d x is equal to 1 in the full range of x. Next the probability that the random variable x values between say a and b is given by the area of the shaded region under the density curve. In other words the probability that a is less than equal to x is less than equal to b is nothing but area under the let us take an example. Let us consider a continuous random variable x which is defined as the amount of weakly gravel sales by a construction supply company and suppose the pdf is claimed to be of the type 3 by 2 1 minus x square. So, in order to verify that it is indeed a pdf we need to check that the integral of f x over the range of x is indeed 1 which as you can see has been worked out here and it is coming out to be 1. Moreover f x is also positive for all values of x. Now for continuous random variable unlike the discrete random variables for any number w the probability that x is equal to w is 0. In other words for any two numbers a and b with a less than b the probability that x is lying between a and b both inclusive is equal to the other four possibilities other three possibilities indicated here because of the simple reason that the probability at a point x for a continuous random variable x is 0. Now consider a probability density function f x to be equal to c x where x lies between 0 and 1. So, what is c? So, if we try to quickly have the density curve. So, the picture should look like this being a straight line with an intercept at 0. So, in order to work out the value of c we have the integral c x dx from 0 to 1 this must be equal to 1 which implies that c x square by 2 this is 1 that is c by 2 1 minus 0 is 1 which implies c is equal to 2. So, in order that f x is a pdf the value of c works out to as 1 as c works out to as 2 and now suppose I want to find out the probability that x lies between say 1 by 4 and 3 by 4. So, can you work this out now? So, from the definition this is nothing but 2 x dx 1 by 4 3 by 4 and this works out to as 1 by 2 if you work it out. So, you can find out probabilities within certain range of x as required. Now, if I want to find out the median of this random variable say mu curl is the median. So, for that this should satisfy. So, can you work out the median for this random variable x having this pdf? So, that would amount to integral 0 to mu curl 2 x dx is half that would be x square mu curl 0 is the sum of this half and that gives you mu curl is 1 by root 2 which is 0.71. So, you can find out the median of a random variable given the pdf. Let us take a standard form of a continuous random variable. So, let us take the distribution the uniform distribution. Now, continuous random variable x is set to have a uniform distribution on the interval a to b if the pdf of x is defined as 1 by b minus a for x lying between a and b and is equal to 0 otherwise. Now, have you come across such a random variable in your present activities or uniform random variable how many of you have come across the word random numbers random numbers. So, any standard calculator or your computers you will have some command where it usually by the name RND when you input that it gives you a number between 0 and 1. So, that is a random number and what is the property of that random number. So, the chance of getting any value between 0 and 1 is same. So, that is what is the case here too. So, let us consider a random variable say x which is the say waiting time say in minutes say tum tum in front of the hostel number 12. So, just to illustrate one possible probability density function for some random variable x could be f x is equal to 1 by 20 for x lying between 0 and 20 and is equal to 0 otherwise. So, we are guaranteeing that there would be a tum tum bus within 20 minutes and we are saying that the time such time is having a uniform distribution. So, the picture that we have here looks like. So, the density curve would be a straight line at 1 by 20. So, if you look at the total area under the curve between 0 between a and b this would be integral 0 to 20 1 by 20. So, this is a pdf and we can consider trying to find the probability that the waiting time is between 5 and 10. So, can you tell me what is the probability that your waiting time is between 5 and 10? Can you work it out? What is the probability that the waiting time is between 5 and 10? 1 by 1 by 4. So, we are looking at this area and graphically it is so obvious and if you actually work it out this should work out to as 1 by 4. And as mentioned value of a uniform 0 1 random variable is called a random number. The pdf is f 0 1 which is 1 for x lying between 0 and 1. And is equal to 0 otherwise. Now, we look at the cumulative distribution functions and expected values in the context of continuous random variables. The cumulative distribution function capital F x is a continuous for a continuous random variable x is defined for every number little x by F x which is the probability of the random variable x taking values less than equal to little x which is the integral of F x between minus infinity to x. And for each x F x is the area under the density curve to the left of x. So, in other words if you draw the picture. So, if this is F x say suppose this is a density curve and if I have x here then this gives you F x which is nothing but the probability that x is less than equal to little x. So, this is the distribution function the cumulative distribution function of a random variable x. Now, this is used frequently to find certain probabilities because all probability between 2 points x taking values between 2 points can be expressed in terms of the cumulative distribution functions. And that we see here let x be a continuous random variable with pdf F x and the cumulative distribution function cdf capital F x. Then for any number a the probability that x is greater than a is 1 minus F a. And for any numbers a and b with a less than b the probability that x lies between a and b is nothing but the difference of the cumulative distribution functions at the point b and a. So, typically here the picture is we have a and we are looking at this probability which is 1 minus F of a. And here if you are looking at finding this probability this is nothing but the probability on the left of b minus the probability at the left of a. So, this is here this and this is this. Now, we see that we can obtain the density function from the cumulative distribution functions. If x is a continuous random variable with pdf F x and the cdf capital F x then at every number x for which the derivative F prime x exists F prime x is equal to F x. So, if you take the example of the uniform distribution for which pdf was 1 by b minus a for x lying between a and b. What is F x that would be integral a to x 1 by b minus a dx which is x by b minus a dx. x and z a which is equal to x minus a by b minus a. So, if you have the cumulative distribution function F x here in the case of uniform distribution as x minus a by b by b minus a. And so from here you can clearly see if you take the derivative of this cumulative distribution function that reduces to 1 by b minus a which is equal to your F x. Any questions so far? Now, we define percentiles. Let p be a number between 0 and 1 the 100 into pth percentile of the distribution of a continuous random variable y denoted by eta p is defined by 1 by b minus a. p is equal to the cdf at eta p. So, eta p is such a value so that the area on the left of eta p we have 100 into p percentage of the area on the left of eta p. So, what would be the median then? We know that the median value is the value such that the area on the left of that point is 50 percent of the area total area. So, the median of a continuous distribution denoted by mu curl is nothing but the 50th percentile. So, mu curl satisfies 0.5 is equal to the cdf at mu curl that is half the area under the density curve is to the left of mu curl. Expected value I saw that this was already covered in your earlier lectures, but just to review the expected or the mean value of a continuous random variable x with pdf f x is defined as the integral x f x over the range of x. The expected value of a function of random variable x h x say is defined as the integral of h x f x. Similarly, the variance and the standard deviation is defined the variance of a continuous random variable x with pdf f x. And mean mu is nothing but the second moment which is the integral of x minus mu square f x. That is how we define the variance and this measures the variability or the dispersion of the random variable x in and around the mean mu. And the standard deviation is the square root of the variance. And we have the shortcut formula which can be easily derived from the expression here of the variance which works out to as the expected value of x square minus the mean square. So, now we have an example the same example which we introduced a while back the amount of weakly gravel sales by a construction supply company. We already have the pdf. So, what is the expected amount of weakly gravel sales? So, in order to find the expected value of the weakly gravel sales we have x times pdf f x and that works out to as 3 by 8 that is the expected value of the weakly gravel sales. So, what would be the variance or the variability in such sales? So, that is going to be the expected value of x square minus the expected value of x whole square which is the expected value 3 by 2 integral x square 1 minus x square d x from 0 to 1 minus we had 3 by 8 as the expected value this thing square which works out to as 3 by 2 x cube by 3 minus x 5 by 5. And this on simplification gives you the value 19 by 320 which works out to as 0.059. This is your variability in the weakly gravel sale around the mean of 3 by 8. And so the standard deviation or the s d that is the square root of 0.059 which works out to as 0.244. So, with this definition of the expected value and the variance of a continuous random variable can you find out the expected value and the variance of the uniform random variable x because you know the p d f f x. So, can you find out the mean and the variance of x where x follows the uniform distribution is the question clear? Find the expected value and the variance of the random variable x which follows the uniform distribution. So, what is the expected value? So, f x is 1 by b minus a so the expected value and this if you simplify this works out to as a plus b by 2. And the variance on routine algebra would be leading to the expression a minus b whole square by 12. So, with this background we are now in a position to introduce one of the most important probability distribution that is the normal distribution. And most of the natural variables which one comes across can be shown in the long run to be following a normal distribution. Now, for example, let us start with the variable heights of individuals the heights of individual in IITB all students and faculty members if we consider that as our population. And we find out the consider the random variable x which is the height of the individuals what sort of distribution is expected for such heights. The average height would be a figure in and around say 5 feet 5 inches or something like that right 5 feet 5 inches 5 feet 4 inches would be the average height of individuals in IITB at this point of time suppose. Now, there would be few people who are taller than 5.5 5 5 feet 5 4 inches and there would be people who are shorter than 5 feet 4 inches. And you will see the distribution where it tappers you will have less number of people who are very tall you will have less number of people who are very short right the concentration would be in and around the mean. And what should be expected about the shape in and around the mean it should be more or less symmetric in and around the mean. And that is what this distribution is. In fact, many populations and processes where we come across such variables have the close fit of this normal distribution height was one example weight measurement errors in scientific experiments thickness of materials etcetera. And though we have here the formal definition of a random variable which follows a normal distribution the typical shape of this distribution is if you look at the height example 5 feet say 4 inches is the mean and the density curve should look like this which is in and around the mean and the symmetric and the total area under this curve is one. So, what we see is that this normal distribution is bell shaped its inverted bell it is symmetric about the mean and the variance of this distribution is. So, it is bell shaped symmetric about the center mu and the variance of the distribution is say sigma square and since it is symmetric about the mean mu. So, your mean should also be equal to the median it is in fact also equal to the mode and the s d is sigma here. So, more formally a continuous random variable x is said to have a normal distribution with parameters mu and sigma where mu is the mean and sigma is the standard deviation where mu can be any value between minus infinity and infinity and sigma is positive if the pdf of x looks like this. So, this is the explicit form of the density function, but the properties. So, we have the shapes. So, the shape could be say in and around with mean say my mean is 10 and sigma is 5. So, this one is would be the mean and sigma being 5. Now, if I consider the normal distribution where mean is 10 and say sigma is 2.5 what would be the picture looking like then in order to. So, we have 10 here and 15 and 5. So, you see that if s d is smaller. So, this is the distribution is such that the density curve is more tappered and the tails are much longer and in the other case suppose we have mu equal to 10 and sigma is 10. So, it is much having much higher variability. So, that is going to be more. So, here we have the expected value of x as mu and the variance of x being sigma square and if I want to find out the area under a region say between a and b that would amount to finding the integral a to b 1 by root 2 pi sigma I am writing the pdf. Now, can you work out this integral for any specified values of a and b how easy is it is it easy to evaluate this area for given a and b. In fact, if you try to attempt you will see that it is not easy to evaluate. However, for mu equal to 0 and sigma equal to 1 we have the numerical evaluations of that has been done. So, based on that numerically you can easily compute these integrals and in fact, Sylab in fact has codes norm command is there which will give you the area within any two values. Let us look at the normal curve and see what are its additional properties the approximate percentage of area within given standard deviations. This is the empirical rule which says that within 1 s d of the mean 68 percent of the values of the random variable x would lie within 2 s d of the mean 95 percent of the observations would lie and in that within 3 s d of the mean 99.7 percent of the values would lie. This is called the empirical rule of the normal curve. So, here so 68 percent of the values within 1 s d of mean and similarly 95 and 99.7 percent of the values are within 2 s d and 3 s d of the mean. So, this is important empirical result for all random variables having an approximate normal distribution. Now, the standard normal distributions the normal distribution with parameter values mu equal to 0 and sigma equal to 1 is called the standard normal distribution. The random variable is denoted by z and the pdf is as here you just replace mu by 0 and sigma square by 1 and that is going to be the pdf and the cdf is also defined accordingly. So, this cdf function phi of z these have been worked out for various values of this z. So, the standard normal cumulative areas is available in a tabular form in your book and this is in page 612 of your book you will have all these values. So, typically if you want to find the area on the left of say 0. So, that would be this value that is 0.5. As you can see the area on the left of the table is 3.49. So, 3.4 or say 3 is here. So, 3.09 if you want to find the area on the left of that that is going to be 0.9990. So, that is the area under the curve on the left of 3.09. So, this table gives you such cumulative areas and psi lab also would allow you to get these values directly. So, we have tabular values available in any standard textbook of statistics which gives you area on the left of the point z. So, if you want to find out using the standard normal tables the probability that z is less than equal to 0.85 that would be the area on the left of 0.85 in your table. So, let us see the area on the left of 0.85 from the table. So, 0.85. So, 0.8 is here and 85 5 is here. So, this is the area on the left of 0.85 which is 0.8289. So, it is not this it is this 0.85 0.8023 is the area on the left. Similarly, if you want to find out the area on the right of 1.32 that is the probability that z is greater than 1.32 that is going to be 1.320. So, this area on the left on the right of 1.32 is nothing but the total area which is 1 minus the area on the left of 1.32 and that again you can look from the tables and that is going to give you the value as here. If you want to find out the probability that the standard normal variable z lies between minus 2.1 and 1.78 we need to find the area on the left of 1.78 and then subtract the area to the left of minus 2.1 that is what where we use the property of the cumulative distribution function. The priority of z lying between a and b is the CDF of b minus the CDF at a and that can be worked out. However, how do you find out this one? How will you find out this probability? The table if you recall gives you values of z which is the values of z lies between 0 2 up to 3.49. So, it is only on the right of 0. So, it is all taking positive values. So, we can therefore, thank you. Now, so in order to find this area therefore, because of symmetry we can work this out. We can work this out and that would be 1 minus probability of z less than 2.1 which is equal to 1 minus 0.9821 which is 0.010.0179. So, the probability that is going to be applied here because the values of z are only positive. So, in order to find this area you have to find the symmetric area on the right of 2.1 which is 1 minus the area on the left of 2.1. So, these tricks you would need in actually find in order to find the values of the area under the z curve and all numerical problems would require basically referring to the table in the from the examination point as well. So, you must be well conversant with such tables because these tables would be provided. I think I will stop here today. Thank you very much.