 we have noticed that we can classify the random variables as a discrete random variable as a continuous random variable or a mixture of the two. Accordingly, if we have a discrete random variable the probability distribution of discrete random variable is described by a probability mass function. The probability distribution of a continuous random variable is described by a probability density function. And for mixed random variable for the portion where we have the masses we have a probability mass function. And for the portion where the distribution is located over an interval we describe it by a density function. We also saw that there is a more general function which is helpful in describing the various kind of random variables that is called the cumulative distribution function of a random variable. And we also saw its relations with the probability mass function and the probability density function in case the random variables are totally discrete or totally continuous. If we have the description of the probability mass function and the probability or the probability density function then we can say that we know or we have all the information about a random variable. However, there are certain characteristics which are also helpful in describing a random variable or its properties. For example, all the time we may not need to describe the full probability distribution of a random variable, but we may be satisfied with certain characteristics. In common sense language we talk about say the mean of a distribution or the average value of a random variable. For example, if we are discussing the heights of individuals in a particular ethnic group then we may say ok what is the average height. If we are discussing the longevity or the average age at death then we a common question would be that what is the average age at death. For example, it may be of interest to the people who are dealing with the insurance companies and they want to fix up a premium for certain type of policies. Whether scientist talks about average rainfall, the average temperatures increasing in the day of global warming, we talk about the average yield per hectare for certain crop in a particular year. So, therefore, these quantities like average also we talk about variability that we may say ok average rainfall was the same in the 2 years, but there was lot of variation. For example, there were lot of rain happened at one time and at another time it was like a drought kind of situation whereas, in another place it was a continuous rain in mild amounts. So, then that means, there is more variation. So, in day to day language we talk about these terminologies such as average variability or dispersion etcetera. So, we now formally define these characteristics which are helpful in understanding the nature of a random variable and its probability distribution. So, we start with the concept of mathematical expectation. Let me define this now. Let x be a continuous random variable with a probability density function say f x. As we usual we have been putting a subscript here to denote the random variable capital X and the value taken by that is written by small x. So, we define the expected value of X as expectation of X is equal to integral x f x dx from minus infinity to infinity, but there is a condition here since it is expressed in terms of an integral we say provided the integral is absolutely convergent. Similarly, if x is a discrete random variable say with probability mass function let me call it p x x i where x i belongs to x. We define the expected value as expectation of X is equal to sigma x i p x x i where x i belongs to x. Once again we impose the condition because this could be infinite series also. So, we put a condition here provided the series is absolutely convergent. So, this expectation X which we have defined this is also called mean of X average value of X etcetera we have several names for this thing. This is also called first moment of X about origin. Now, the question is that how do we interpret as scientists or engineers who will use this quantity how do they interpret this? Why do we call it average? You can see here let us look at the discrete case. In the discrete case we are considering that random variable X takes value x i with probability p x i. So, this gives that the value is multiplied by the probability and then we are considering the summation of this. Now, if you interpret it in this way that suppose we have a weightless bar and we have the points here say x 1, x 2 and so on x n say and at x 1 we place a mass p x 1 at x 2 we place a mass p x 2 and p x n at x n. Then this sigma x i p x i i is equal to 1 to n this will be relating to a balance point balance point of this weightless bar. Suppose this is a string which is hung from to ends here and then you place the masses. So, this is like in mechanics we call it moment. So, you can have a nice interpretation of this here. Similarly, in the continuous case we may consider it as a rod and if you consider the rod at x you have the density say f x then again this expectation this x f x dx this will denote the center of gravity or balance point of this rod. So, this has a nice physical interpretation here. In case the integral is not absolutely convergent or series then we say that expectation does not exist. Now, you will see that this elementary concept is extremely helpful for looking at various day to day problems and interpretations of those things. So, let us consider say one example let me give let x be a random variable with density function. So, this is a continuous random variable. Let us consider say x is actually denoting life in hours of an electronic device. Suppose the density function is given by say f x is equal to 20,000 by x cube for x greater than 100 it is 0 for x less than or equal to 100. I want to find out what is the expected value of this. Now, in this case the range of the distribution is from 100 to infinity because for x less than or equal to 100 the density function is 0. So, it is 20,000 by x square dx you can easily see that this is nothing, but 200. So, that means, we conclude that average life is 200 hours. So, rather than bothering about the entire distribution how it is varying this is also a useful information that we can say that the average life of the device is 200 hours because when we study the physical phenomena it is fine to have a probabilistic model and we can have full distribution we can find various probabilities, but a certain person may not be completely interested in the entire thing he may be satisfied with the average value. So, now, you can see here we have average life and we may like to make certain decision based on that. Let me take one example for a discrete case also. So, let us consider say suppose we want to ensure. So, insurance amount say rupees 50,000 for certain thing. Now, insurance company estimates that a loss may occur with probability say 0.002 complete loss a partial loss say 50 percent loss with probability say 0.01 and say 25 percent loss with probability 0.1. This could be like some vehicle insurance ok. Suppose it is a vehicle insurance. So, that means, there is a 0.002 probability that there will be complete damage to the vehicle that means, it will not be usable, but the probability of that is small 0.01 probability is that there may be a 50 percent loss or damage to the vehicle and there is a 0.1 percent chance that 25 percent loss will be there that is the probability of a 25 percent loss is 0.1. Now, the question is that how much premium the insurance company should charge. So, for example, that means company should calculate say the expected loss. So, what is the expected loss here? So, expected loss here will be that is equal to 50,000 then full loss means 1 we will put 0.002 plus 50 percent loss means we will put half 0.01 and 25 percent loss. So, 0.1. So, this is equal to 1600. So, expected loss is actually 1600. Now suppose the company wants annual profit of 500 rupees on this policy then how much he the company should charge the premium. So, naturally you put 1600 plus 500 that means, 2100 should be the charge. So, if we say the company insurance company may charge rupees 2100 per annum for an annual profit of rupees 500. So, you can see that there is a practical usage of this mathematical expectation here. So, for example, I have told here that an insurance company can calculate how much premium it should charge for a policy of a certain amount. So, for example, if the insurance amount is 50000 and if the losses have been estimated that how much the loss distribution is known then we can see that expected loss is 1600 rupees. Now if the company wants to survive with a profit of 500 rupees per annum then the per annum premium should be 2100. Now we can extend this concept of expectation. Here we have seen that expectation of X that means, whatever random variable originally we are considering for that we can calculate the average value, but many times we may not be interested in that value itself we may be interested in some function of that. So, then we can extend this concept let say G X be a measurable function. So, in that case we will say the expected value of G X. Now suppose it is discrete then it will be G X i into P X X i for X i belonging to X if X is discrete with P M F P X i and it is equal to integral G X F X D X if X is continuous with P D F F X. So, you have to say provided the series or the integral are absolutely convergent. Now there are certain elementary properties that this expectation will satisfy. So, I will state it in the form of a theorem say which are quite easy to verify. Let X be any random variable then if I am considering expectation of a constant then it is same constant. You can see the proof of this is extremely simple if I put C here in place of G X i then it is becoming C times the sum of the probabilities which is equal to 1 or here if I put C then I can take out C and I will be integrating the density over the entire range which should give me C again. Therefore, this is equal to C expectation of C times G X is equal to C times expectation of G X where again C is a constant. Again you can see the proof is extremely simple. Thirdly expectation of say G X plus some H X where G and H are measurable functions it is equal to expectation of G X plus expectation of H X. Once again provided the expectations on the right hand side exist. Once again the proof of this will be quite simple if we substitute here then we get G X plus H X here and if the series are convergent then I can separate out and similarly in the integral part. Now, using this we talk about some further characteristics. We define variance of variance of a random variable. So, we define it to be variance of X a popular notation is also sigma square which I will mention later on that is equal to expectation of X minus expectation of X whole square and the square root of variance X this is called the standard deviation. Now, what is the use of this or what is the justification for defining this quantity. Now, you can see here see I may have two random variables whose balance points may be same, but you can see here this random variable has values with less variability there are more values concentrated in the middle whereas, this random variable has is having more variability you may even have something like this. So, it may all of them may have the same mean, but the variability may be different. So, now what we do suppose this is the expectation value here. So, we consider the deviation from the mean and then we square it see one may say that why not take the expectation of this, but then this is equal to 0 because expectation X minus expectation X will become 0. So, we another option is to take absolute deviation. So, one may take that that is called the mean deviation about the mean and here we are considering the squared. So, this is called the variance and the square root of this is called the standard deviation of X. Let me consider one example here for which we already calculated the expectation. So, let us consider this say we have of course, this may have a long decimal places. So, let me take a simpler one suppose I consider X to be a random variable probability X equal to say minus 1 is equal to say 1 by 6 probability X equal to 0 is say 1 by 3 and probability of X equal to 1 is equal to half. So, for example, what is expectation of this? Expectation of this will become minus 1 by 6 plus 0 into 1 by 3 plus 1 by 2 that is equal to 1 by 3. Let me use a notation say mu here. Now, we can also see this simplification this quantity I can simplify expectation of X square plus expectation square X minus twice X expectation X. Now, this I can expand this becomes expectation X square plus expectation square X minus twice. Now, here if I take expectation it becomes expectation X into expectation X that is again . So, this quantity actually becomes expectation of X square minus expectation square X. So, this is an alternative computational formula for the variance here. So, let me take this expectation X square here it is equal to 1 by 6 plus half that is equal to 2 by 3. So, if I look at the variance of X that is equal to expectation of X square minus expectation square X then that is equal to 2 by 3 minus 1 by 9. So, that is equal to 5 by 9. Now, one may look at this see this is a discrete distribution and all the values are available here. So, you look at this if I consider the bar chart here for the distribution let me consider this is 0 this is minus 1 this is 1. So, you are putting the and let me draw through a bar diagram. So, 1 by 6 is say here then 1 by 3 is double of that and 1 by 2 is triple of that. So, it is something like this. So, this is the shape of the distribution here and you look at the mean, mean is 1 by 3 that is coming here that is the mean because of higher weight to the right side the mean has shifted. And if you look at the variability that is 5 by 9 is a variance and if I consider the standard deviation of X that is equal to root 5 by 3 which is less than 1 basically because root 5 is 2.2 something. So, it is approximately 2.2 by 3 that is roughly 0.7 kind of thing that is the variability here. So, by this we should not think that the variability is 1 here that while you are differing by minus 1 0 and 1 because the weights are different and therefore, the variability is actually 0.7 here. Now, one may easily think that since I calculated expectation X square. So, that means, we can consider expectation of any power of the random variable that gives the concept of the moments. So, let us consider then non-central moments. So, we define mu k prime it is equal to expectation of X to the power k for any positive integer power of X this is called k th non-central moment of X or it is also called k th moment about the origin. And similarly we can define central moments that is defined by mu k that is expectation of X minus expectation of X to the power k for k equal to 1 2 and so on. This is called k th central moment of X or k th moment about the mean. Likewise we can actually define absolute moments also like expectation of modulus X to the power k, k th absolute moment and again you can define say beta k star say expectation of X minus expectation of X to the power k. So, these are all various kind of moments one can think of. We can also think of something called factorial moments of course, we have to see that what is the use of defining these kind of definitions. So, factorial moments for example, we define expectation of X into X minus 1 up to X minus k plus 1. So, I will tell you here see if we consider the mu 1 prime. So, what is mu 1 prime? If I put mu 1 prime then this is nothing, but expectation X that is the mean of X. Similarly, if I consider mu 2 that is expectation of X minus expectation of X square that is nothing, but the variance of X. So, you see that this concept of non-central and central moment is simply a generalization of the concept of mean and variance and then the natural question that why should we consider higher order. So, as we have seen that we have some use of the characteristics such as mean for example, it is telling you about the average. Similarly, the variance or it is a square root tells about the variability in the values. Similarly, you can also talk about the shape of the curve. For example, you can consider the shapes like this, you may consider a shape like this and you may consider a shape of distribution like this. Now, the purpose of drawing this curve is to look at the distinction. If you look at this one a layman's view will be that this looks like a symmetric curve. If it is a symmetric curve the mean is centered and you can say that the values are or the probabilities are equally distributed along the on both the sides of the mean. In this one if you see you have lot of weightage to the left side, but on the right side you have a long tail. For example, if I consider the number of students appearing in a competitive examination. For example, an engineering entrance examination like J examination. So, the number of students who appear is very large for example, 1 lakh 50,000 students appear this year in the joint entrance examination of IITs. But out of that only 10,000 or 15,000 students are declared qualified. So, a large chunk of students is at the bottom and if you look at the average value that will be somewhere here and then the. So, average will be pushed down because there is lot of weightage to the left side for low marks. There are large number of a students with very low marks and there will be very few students with a high marks which are actually declared as qualified. So, if there is a long tail to the right side then it is called a positively skewed curve. This will be called a symmetric curve and on the other hand you have a similar thing here. For example, if you are looking at the lives of the say human beings for example. So, a large number of people they live longer and there will be a certain number of people who die at an early age. For example, infant deaths are deaths of children below 5 years and so on. But there will be a majority of people who live longer say who complete the age of 60, 70 or 80 kind of thing. So, this is a negatively skewed. So, here the average age is pushed up. For example, in developing countries you see that the average life is 62 years, 63 years. In developed countries the average life is 75 years or 76 years kind of thing. So, average life is pushed, but although you have people in all the age groups for example, the people dying very early infant deaths death at births and so on. So, here the mean is pushed to the right, but there is a long tail to the left. This is called a negatively skewed curve. An empirical measure for checking the skewness of the curve is based on the third central moment. So, we define a measure of skewness or we can say asymmetry. Let me give some name to it say beta 1 that is defined to be based on mu 3. One thing that we notice for example, if x is the say life in hours. So, expectation will be in the life in the hours. If I take mu 2 variance then it will be squared units and that is why we take square root of that to make it standard deviation. So, that will become again in the hours. Similarly, suppose we are talking about average income then it may be in rupees and then the here it will be squared rupees and if I take a square root then it will be again in the rupees. Similarly, if I consider mu 3. So, in mu 3 you will have power 3. Therefore, it will be in the cubic. Therefore, to make it free from the units of measurement we consider division by mu 2 to the power 3 by 2. A standard notation for expectation x is mu and another standard notation for variance is sigma square. So, these are some of the popular notations which are found in the books. Let me also use them. So, we can also write it as mu 3 by sigma cube here. So, this is called coefficient or or measure of skewness. Let me also talk about another type of variation that may occur in the curve in the shape of the distribution. Once again it looks like some normal type of distribution normal curve you may have a curve like this and you may have a curve like this. Apart from the difference in the variability another is trying striking thing could be the height of the curves. See this curve takes more height this curves has a flat kind of shape here. Now, this is also a property this is called a property of kurtosis. That is peakedness. So, we say this has a normal peak this is a high peak and in statistical terminology we call it a lepto-kartic curve and this one we call as a platy-kartic. A measure an empirical measure for measuring the kurtosis is based on mu 4. So, beta 2 we define to be mu 4 by mu 2 square minus 3. So, this is a measure of kurtosis. When we do the special distributions I will show you that how what they inform about the shapes of the distributions. Apart from the characteristics of the distribution which are based on the moments we can also base it on the distribution of the probability itself. What is the meaning of that? Because sometimes the moments may not exist because we are putting a condition of the absolute convergence. We can actually have an example let us say non-existence of moments. Let us look at this example. Here if I consider say expectation of x square for calculation of variance which I will need then it becomes 20,000 by x from 100 to infinity. Now this is equal to 20000 log x from 100 to infinity. Naturally you can see that this is infinite this does not exist. So, here in this case variance of x does not exist because this expectation x square does not exist. So, expectation so variance will also not exist that means this does not exist. That is why there are certain measures which are defined on the basis of the distribution of the probability itself. We talk about let us look at the c non-existence of moments another example. Suppose I consider this distribution f x is equal to 1 by pi 1 plus x square from minus infinity to infinity. If I plot it it looks like this at x is equal to 0 it is having value 1 by pi and as x tends to plus infinity and minus infinity it goes to 0. If I look at say expectation of x here then this is equal to integral x by 1 plus x square dx. But this integral does not exist because if I consider modulus of this then this is equal to twice 0 to infinity x by 1 plus x square which is divergent. So, in this case also expectation x does not exist. In fact, this is one of the well known distribution this is called a Cauchy distribution. We will discuss later about this distribution that how it arises etcetera. And therefore, we need certain characteristics which may be useful when the moment structure is not known or it does not exist. So, they are called quantiles. So, by quantile we mean that we find out the points on the curve below which you have certain probability for example, you may have half probability here half probability here. You may have a probability say 1 by 4 here 1 by 4 here and say as 1 by 4 here and 1 by 4 here. So, that means, the points which split the area under the curve into certain proportions they are called quantiles. So, we define. So, in a rough way we can simply say that if I am finding out probability half then probability of x less than or equal to that point should be equal to half. But at the same time you have to keep on the track on the right hand side also and to take care of the discrete distributions we define it in a slightly generalized way. The p th quantile of a random variable x is denoted by say q p and it should satisfy probability x less than or equal to q p is greater than or equal to p and probability of x greater than or equal to q p is greater than or equal to 1 minus p where of course, p is a number between 0 and 1 q half that is called median. So, you can see in this example mean did not exist, but if I look at median then what is the point such that you will have probability equal to half you can easily see that this is symmetric around 0. So, if I integrate from minus infinity to 0 1 by pi 1 by 1 plus x square dx then it is equal to 1 by pi tan inverse 0 minus tan inverse minus infinity. So, that is equal to half. So, here 0 is the median. So, m is equal to 0 that is the median of Cauchy distribution. In a similar way we can consider the points which are dividing into 1 by 4 etcetera that means, q 1 by 4 q half q 3 by 4 these are called quartiles q 1 by tan q 2 by tan. So, q 9 by tan etcetera these are called deciles and so on q 1 by 100 q 2 by 100 these are called percentiles because this will divide the entire probability distribution into 100 parts. We can have a formal definition of symmetry based on this. We will say that the distribution is symmetric about A if the probabilities are equally distributed on both the sides of or on either side of the point A. So, for example, we may have probability of say x less than or equal to say A minus x is equal to probability of x greater than or equal to A plus x for all x. Then we say x has a distribution which is symmetric about now a consequence of the symmetry is that if x is symmetric about A and x is continuous with say pdf say f x then you will have f of A minus x is equal to f of A plus x for all x. Similarly, if A is equal to 0 then the odd moments will be 0. If you have A is equal to expectation of x then mu 2 k plus 1 that is a central moments will be 0 odd central moments will be 0. I will be discussing some problem later on right now let me develop the theory or the all the properties of this moments quantiles etcetera. There is a useful function which can generate the moments. For example, here you need to calculate all the moments one by one by looking at the absolute convergence and so on. Now, natural question arises if there is a function which will give me the values of all then it may be much better. Fortunately there exist a function of this nature let me call it moment generating function. So, for a random variable x its moment generating function m g f is defined by m x t is equal to expectation of e to the power t x provided the expectation exists. In a neighborhood t equal to 0 you can see actually that at t is equal to 0 this will always exist because it will become 1. So, we will say that m g f exist only if for some non 0 value of t it will exist and since it is a continuous function. So, if it exists for some non 0 value then there will be a interval in which it will exist. Let us take a very simple example. Let us consider this example. For this example what is moment generating function? So that is equal to expectation of x sorry expectation of e to the power t x. So, that is equal to if I put x equal to minus 1 it becomes e to the power minus t 1 by 6. When I put 0 this will become 1. So, it is 1 by 3 plus half e to the power t. So, now you can see that this is the moment generating function and certainly the question arises that what are the uses of this. Let me take another example also. So, let us take say f x as a density function say half e to the power minus x by 2 for x positive. So, what is the moment generating function here? It will become expectation of e to the power t x that is the e to the power t x multiplied by the density function integrated over the range. So, that is equal to 0 to infinity half e to the power minus x 1 minus 2 t by 2 dx. So, that is equal to 1 by 1 minus 2 t this is valid for t less than half. So, easily you can see that for a non-zero value of t it is there and therefore, in an interval it is existing. Now, a simple property of this will be that if m g f is existing it will be one can differentiate it any number of times. If m g f exists it can be continuously differentiated some neighborhood of the origin. In fact, in the region of existence basically it will exist. You can also see some simple properties suppose I say m y t and I define y to be say a x plus b. This is equal to expectation of e to the power t y that is equal to expectation of e to the power t a x plus b. So, that is equal to expectation of a t x plus b t that we can write as e to the power b t expectation of e to the power a t x that is equal to e to the power b t into the moment generating function of x at the point a t. Now, the question is that why we have introduced this function what is the use of this. So, you can see here we can consider m x t as expectation of e to the power t x and let us consider the expansion of this. So, it becomes t x plus t square x square by 2 factorial plus t cube x cube by 3 factorial and so on. That is equal to now if I assume that the m g f is existing. That means, this quantity is existing if this quantity is existing that means, all the term by term expectations will exist. That means, I will get it as 1 plus t expectation x that is mu 1 prime plus t square by 2 factorial mu 2 prime and so on. That is the coefficient of t to the power k by k factorial is mu k prime. Another way of looking at it is if I consider say derivative of this then I will get mu 1 prime plus t mu 2 prime and so on. So, if I substitute d by d t m x t at t equal to 0 then I will get mu 1 prime. Similarly, if I consider second derivative then here I will get mu 2 prime and this term will contain t. So, in general I can consider k th order derivative and if I put t equal to 0 I will get the k th non central moment. Now, this is extremely useful result because if m g f is given to us we can find all the moments. So, that is why the name moment generating function is justified for this. Let us take this example here. Let us look at this one. So, if I consider the derivative of this that is 1 by 6 e to the power minus t plus 1 by 3 plus half e to the power t. If I consider m x prime t that is equal to minus 1 by 6 e to the power minus t plus half e to the power t and if I put t equal to 0 here I get minus 1 by 6 plus half that is equal to 1 by 3 which was actually the mean of this distribution. If you look at this we had the mean here. So, this is the mean. Similarly, if I consider say second derivative m x double prime t then I will get 1 by 6 e to the power minus t plus half e to the power t. So, mu 2 prime is equal to the value of this at t equal to 0 that is 1 by 6 plus half that is equal to 2 by 3 which was the value calculated here. So, you can see that having the moment generating function it is extremely helpful to calculate moments of various orders and since if the function is nice we can actually write it in such a form that the successive differentiation formula can be applied and then we can write the moments of any order. Let us look at say this example say f x is equal to say lambda e to the power minus lambda x. So, what is m x t here? If you look at m x t it is becoming integral lambda e to the power minus lambda x e to the power lambda t e to the power t x dx from 0 to infinity. So, you can easily see that it is convergent for t less than lambda. Now, you look at this is having very nice form. If I consider the k plus 1th order derivative that is equal to my k factorial because every time if I differentiate there is a minus here. So, that will become plus. So, it becomes k factorial and then you will have lambda minus t to the power k plus 1 lambda. So, you can see that lambda is equal to 0. So, if I put t equal to 0 then this is becoming k factorial divided by lambda to the power k which you can verify here. If I look at say mu k prime directly then I will get lambda x to the power k e to the power minus lambda x dx from 0 to infinity. So, this is becoming lambda k factorial divided by lambda to the power k plus 1 using gamma function. So, it is again equal to k factorial by lambda to the power k. So, you can see that these two things are same. So, this moment generating function is extremely useful function for determining the moments and therefore, all other characteristics such as mean variance measures of skewness and kurtosis etcetera. One or two other important properties of the MGF are there which I will list without proof here that the MGF the moment generating function uniquely determines a distribution. So, I am writing cumulative distribution function and conversely if the MGF exists it is unique. That means, no two distinct distributions may have the same MGF and therefore, there is a one to one correspondence and that is extremely useful because many times what happens that we are considering distribution of certain functions of random variables and it may be possible to derive the MGF of that. Now, if we can assign that MGF to a certain distribution then we know that that particular function of random variables has that distribution. So, in identification of the distributions it is extremely useful. There is another one which is actually the moment convergence theorem, but let me also mention this thing. Let mu k prime be the moments of random variable x. If the series sigma mu k prime by k factorial t to the power k converges absolutely to for some t then mu k prime sequence uniquely determines the CDF of x. Now, this is actually a consequence of this uniqueness moment theorem because actually this is nothing but the if you expand expectation of e to the power t x we have seen that it will consist of the terms of the moment sequence. So, accepting the first term all other terms are written. So, if it is absolutely convergent that means, the MGF will exist and therefore, unique determination of the distribution will be there. I mentioned that factorial moment generating function etcetera, but I will not spend too much time here. We will move to the special distributions and in the next lecture I will give the motivation for each distribution which are more commonly used. And for those distributions we will look at the moment structure that what is the mean variance and measures of a Guinness kurtosis etcetera. So, in the following lecture I will be taking up this.