 Hello again, we will be going to the next lecture today. We will be discussing on discrete probability distributions. The reference books for this topic are presented here. Montgomery and Runger's book Applied Statistics and Probability for Engineers, 5th edition, Wiley India 2011 and then you have the book by Organiak random phenomena okay by CRC Press. In the previous lecture we saw about random variables okay. When we conduct a random experiment we are unsure about its outcome and so the different possible values are called as random variables and they may take some specific real values okay. The possible values of random numbers may be discrete entities or they may be continuous range of numbers. What I mean by continuous range of numbers is it may be falling between a lower value and upper value okay or a lower limit and an upper limit if you want it like that. But in between these two limits okay it can take any possible value. As an example if the random variable mole fraction is considered from an experiment okay a distillation experiment for example. The mole fraction can take values only between 0 and 1 okay both included but in between 0 and 1 it can take any value 0.12, 0.123, 0.1234 whatever it depends upon the accuracy of your measurement device okay. Of course the calculator can give any value of the mole fraction running up to 7 or 8 digits okay. So the mole fraction is a continuous random variable here but when you throw a die okay you will find that the die can take only discrete values for example 1 or 2 or 3 up to 6 okay. In this case the random values or the random number values are discrete okay. So we are going to talk about a very interesting function the probability mass function okay. The probability mass function f of x assigns a probability value to each of the discrete values of x there is a typo here which I am correcting okay. So earlier it was capital X here f of capital X now it has been changed to f of x. If you recollect the random variable function has converted the original sample space into real numbers and these numbers are assigned probability values. This we covered in one of the previous lectures okay. So the random variable may take discrete values x1, x2 so on to xn okay. So we now define a probability mass function and denote it as f of xi okay and the role of the probability mass function is to assign a probability value to each of the possible values xi okay. So please note the definition again probability of x is equal to xi that means what is the probability that the random variable capital X takes the value small xi and that is given by the probability mass function f of xi okay. When you write down the statements pertaining to statistical calculations you have to make sure that the terminology you use is correct okay the notation you use is correct. You cannot write f of xi okay. You do not have capital xi you have only small xi you have capital X and small xi but you do not have capital xi okay because once a random variable is assigned a value it becomes small x of i and as we stated earlier f of xi should be greater than or equal to 0 for all the values of xi all the possible values of xi. The means that the probability values are all positive. Now the other important thing to observe about the probability mass function f of xi is the sigma i equals 1 to n f of xi should be equal to 1. It means that the sum of all the probabilities is equal to 1. You have different entities each entity is having a probability value and the probability values are assigned by the probability mass function in such a way that when you add the total probabilities they should be equal to 1 okay. The probability value assigned to each of the xi may be same or they may be different okay. For example the probability of a number coming up let us say 1 when you throw a fair die is 1 by 6. Similarly the probability of a number 2 coming is also 1 by 6. So you have 6 numbers and all of them have share the same probability of 1 by 6 okay. On the other hand there may be other examples where you may have a discrete number of possible outcomes but all of them need not have the same probability value okay. They may be having different fractional probability values but when you add them up they should become equal to 1 that is very important. Mu is equal to e of x okay that is the mathematical notation here. Mu means the mean of the probability distribution okay. e of x means the expected value of the random variable x okay. The expected value of the random variable x is defined as mu is equal to sigma i equals 1 to n xi f of xi okay. The mean is also referred as the average okay and when you take any cricketer okay and he has gone unperformed in a test match series or even when he is coming out to bat after the previous wicket had fallen okay. The first thing which is shown on TV is the number of matches he has played, the number of runs he has scored and then the average okay. The average is a measure of his overall performance okay. If a batsman has scored an average of 50 then he is supposed to be very good but it does not mean that he will score 50 in the current innings. He may score 100 or he may score a duck okay. So the mean refers to the center of the probability distribution okay. Imagine you have a long scale and on the scale you have markings corresponding to the possible values of the random variables okay and each of these values will have a probability assigned to it okay. Let us see that you have numbers 1, 2 and 3 and each of these numbers 1, 2 and 3 will have a probability value assigned to it okay. Number 1 may have a probability of 0.2, number 2 may have a probability of 0.3 and number 3 may have a probability of 0.5 okay. So you put coins on the scale or you put weights on the scale adjust on these numbers. You put weights and you put the weights in such a way that it is proportional to the probability value okay. So if you put coins for number 2 you put more coins for number 3 and you put even more coins for number 5. The number of coins or weights you put for these values will be in proportion okay and then you put the scale on a knife edge and see where the knife edge will balance okay and the knife edge will balance at the mean value. Now let us come to the variance. We know from our earlier discussion because of the variability in the data you had to find the average. The batsman was not scoring 50 in all the innings okay. He was scoring 0 in 100 in the next 30 in the third 70 in the fourth. So when you take an average over a long period of time you get the average of 50. So the performance of the batsman is variable okay and that is why you had to find the mean or average value. Now the variability is quantified in terms of another parameter called as variance. We refer to it as variance of the random variable x and that is defined as the expected value of the deviation of the random variable from the mean squared okay. So what you do is you first find the deviation of the random variable from the mean mu and then square it. This is called as deviation squared and the expected value of this deviation squared or the expected value of the square of the deviation is equal to sigma squared okay. So now by using the definition for expectation we write v of x is equal to sigma is equal to 1 to n xi minus mu whole squared f of xi okay. Please do not forget to put f of xi okay. You may recollect that the sum of all the deviations from the mean will be equal to 0. When you square the deviations from the mean you take the sum they will not be equal to 0 okay and each of such squared deviations okay should be multiplied by the corresponding f of xi. When you expand this sigma is equal to 1 you get xi squared f of xi minus mu squared. This derivation is quite simple and you may want to try it out yourself. The variance is a useful measure as it indicates the spread of the probability distribution okay. Even if there are 2 distributions having the same mean or average it does not imply that they should have the same variances. You may have the same average but you may have different variances. For example in one class the average may be 15 another class the average may be 50 okay. But in the first class the lowest mark may have been 20 and the highest mark may have been 100 okay. So average is still 50 when you add up all the marks and divide by the total number. The second class the lowest mark may have been 40 and the highest mark may have been 60 and the average would have been 50 okay. There is lot more variation in the first class in the marks when compared to the marks in the second class even though both of them had the same average. By definition the square root of the variance is the standard deviation. Variance is denoted by sigma squared and so standard deviation is denoted by sigma. As I said earlier we should look at the notation or terminology issues whenever we use the language of statistics okay. The grammar has to be correct. So let us look at some of the notation issues in the usage of the probability mass function f of xi okay. So can we denote the probability mass function as f of x instead of f of xi okay. You cannot put capital X in the argument of the probability distribution function or the probability mass function okay. It should be f of small xi. So writing the probability mass distribution as f of capital X is equal to capital X plus 3 by 25 makes no sense okay. If the possible random sample space values are 0, 1, 2, 3 and 4 then to find the probability of the occurrence of one of these values xi we use the notation f of xi and not f of capital Xi okay. The following form which I have shown here is correct f of the random variable X taking a value xi is given by f of xi and that is equal to xi plus 3 by 25 okay. This is correct notation. The function f of xi becomes alive and takes a value only when the random experiment has been performed and X random variable capital X can take a particular value xi okay. Before throwing the dice or before throwing a die the random variable was X. Once you have thrown the die perhaps when playing a game of Ludo or Snake and Ladders you know what is the value shown by the die and then that value is xi okay. Then you can find what is the probability of this number xi equals 1 or 2 occurring and you can estimate that value. Sometimes you may not even know the probability beforehand okay. In the case of a die you know that all numbers are equally probable and so you can beforehand itself say that probability of a number 1 occurring is equal to 1 by 6. But there are many random experiments whose outcome is not known for sure and you also cannot predict what that probability of that outcome is going to be okay. So you have to conduct the random experiment obtain the value of xi and then find out the value of f of xi the probability value okay. Generally we can represent the probability density function sorry it is not probability density function. Probability density function is used for continuous distributions. Now we are dealing only with discrete distribution. So the probability distribution function or the probability mass function f of x is given by x plus 3 by 25 small x is used throughout. And this function is evaluated at each possible value of xi okay. So we saw about the mean when we write the definition for mean mu we write it as expected value of capital X okay. Expected value of the discrete random variable X capital X. Mu is equal to e of x but it is not equal to xi f of xi where the xi's are capital xi this is wrong. When we actually go about calculating the expected value then we use small x and f of xi and the argument in f of xi is also small xi okay. So do not use capital xi in f of xi use xi into f of small xi when finding the value of mean. So this is important notation or terminology. Similarly when you are defining the variance of the random variable X it is expected value of e of x minus mu whole squared is equal to sigma squared but this should not be written as sigma is equal to 1 x minus mu whole squared f of x okay. You should not use capital X here but you should rather use small xi. We will take a simple example okay. So let there be a lot of n balls of which B or blue in colour and the remaining are red okay. I am sure that this kind of problems or variance of this problem you would have seen several times in the past. So let us define the random variable as the number of blue balls that have been picked. The random experiment is performed by taking two picks from the lot okay. You take one ball, note the colour, put it back, take the next ball, note the colour and then put it back okay. So this is called as picking and replacing. Let P denote the probability of picking up a blue ball and Q the probability of picking up the red ball okay and the question is you have to define the original sample space, the random variable space, the probability mass function and verify whether its properties are satisfied okay. To summarize P is the probability of picking a blue ball and Q is the probability of picking the red ball. So we have to find the original sample space, random variable space, probability mass function and see whether the properties are satisfied. The original sample space is given by BB, BR, RB, RR okay. So the first event is both of them are blue balls. The next event may be blue ball and red ball or red ball and blue ball. It is also possible that you pick up two red balls okay. So since the random variable X was defined as picking up the number of blue balls okay. So the random variable X is denoting the number of blue balls which have been picked from the random experiment. You have the possibility of 0 blue ball or 1 blue ball okay either of these two or 2 blue balls okay. So these are the possible outcomes expressed in the form of the random variables okay which take the values of 0, 1 and 2 okay. The possible outcomes are no blue ball, 1 blue ball or 2 blue balls. So the original sample space which was having 4 entities was reduced to only 3 entities in the random variable space. So the probability of the random variable taking 0 okay that means no blue balls have been picked. Then both the balls which have been picked are red in colour okay and the probability of that happening is q squared. Similarly f of X is equal to 1. The probability is the first ball picked was blue, the second ball picked was red. So the probability would be pq or the first ball picked was red and the second ball picked was blue. So the probability would be qp, pq plus qp will be 2 pq and the probability of picking up both balls okay the probability of picking up 2 blue balls is equal to p squared. f of X is equal to 2 is p squared okay. Now you can verify whether the sum of the probabilities is equal to 1 okay. So you add up q squared plus 2 pq plus p squared which is nothing but p plus q whole squared okay. So sigma f of X i should be equal to 1. So you get when you add up all the probabilities q squared plus 2 pq plus p squared which is nothing but q plus p whole squared. Since the probability of picking up a blue ball is 1 minus probability of picking up the red ball okay or the probability of picking up a red ball is 1 minus probability of picking up a blue ball. We have q is equal to 1 minus p, you put q is equal to 1 minus p here you get 1. So you are having the sum of the probabilities totaling or adding up to 1 okay. Obviously the probability values are fractional. You have 8 balls in a box, 3 of them are blue and 5 of them are red. Then the probability of picking up a blue ball would be 3 by 8 and the probability of picking up a red ball would be 5 by 8 okay. So you can see that q is equal to 1 minus p. Next we come to another interesting and important function, the cumulative distribution function capital f of X and that is given by f of X is equal to probability of X less than or equal to X okay. What is the probability that the random variable takes any value which is less than or equal to X okay. So that is the cumulative distribution function and that is written as sigma X i less than or equal to X up to n f of X i okay. So you are summing over all the values of X i until you either go below X or reach X. There is an interesting distinction between the probability mass function and the cumulative distribution function okay. In the probability mass function you can only apply the function to the possible values of the random variable X okay. So you can only apply the probability mass function to the X i values. If for example you have another value Y i which is not equal to any of the possible X i values f of Y i is equal to 0 okay. But when you do the cumulative distribution function this X value need not be X i all the time. It may even be a value between 2 X i values between 2 permitted X i values. For example if the permitted X i values are 0, 1, 2, 3, 4 and 5 you can even find the cumulative distribution function of random variable X less than or equal to 3.5. So that is permitted and that value need not be equal to 0 okay. So you are adding up the probabilities as we saw in the mathematical equation. This is the cumulative distribution function. Cumulative means adding or totaling okay. So that is why you add up the probabilities up to the value of X okay. That X may be equal to allowed value of X i or it may be not equal to X i okay. And the cumulative distribution function ranges between 0 to 1 okay even though you are adding the probabilities and the maximum value would be equal to 1. Because the sum of the probabilities is equal to 1 and if X is less than or equal to Y f of X is less than or equal to f of Y. So you can either write it as f of X or f of X i okay. But you cannot write f of capital X. Whenever you are doing the cumulative distribution function calculations there is a possibility of committing mistake okay. So let us see the probability mass function is given to you and you have to find the cumulative distribution function okay. And the statement may be similar but slightly different. Small differences can be noticed. In one case you have to find probability of B less than or equal to X less than or equal to E. In the second case it is probability of B less than X less than E. Third case probability of B less than or equal to X less than E. In the fourth case it is probability of B less than X less than or equal to E okay. So you may think that all of these would result in the same value okay. But that is not going to be the case as I will demonstrate in a, with an example. Can we write probability of B less than or equal to X less than or equal to E as f of E-f of B. No it cannot be written in this fashion. Let us take an example okay. So you are having a cumulative distribution function f of x, small x mind you and these are the possible outcomes A, B, C, D, E, F. These are the possible values taken by the random variable and these denote xi values, small xi values. The probability of A occurring is 0.1, B occurring is 0.2, C occurring is 0.3, D is 0.2, E is 0.1 and F is 0.1 okay. These are the probability values okay and the length of this line corresponds to the probability. This line is twice as long as this line because the probability is 2 times higher okay. Now let us see the probability of the random variable X taking a value less than or equal to E less than and greater than or equal to B. So this is what you want to find out okay. You want to find the probability that the random variable will take any value between B and E both included, both B and E included. So the probability of the random variable taking the value small b is 0.2, C is 0.3, D is 0.2, E is 0.1. You add up 0.1 plus 0.2, 0.3, 0.3 plus 0.3, 0.6 and then 0.6 plus 0.2, 0.8. So that is the value you get okay and that may be written as probability of X less than or equal to E minus probability of X less than B. Probability of X less than or equal to E is 0.9, 0.1 plus 0.2, 0.3, 0.3 plus 0.3, 0.6, 0.6 plus 0.2, 0.8, 0.8 plus 0.1, 0.9 that is why you get 0.9 here and then you get probability of X less than B, less than B is only A and the probability is 0.1, it is 0.9 minus 0.1. The second case is probability of B less than or equal to X less than E okay and that is equal to 0.7. Here B is included but E is not included okay. So it should be less than E. So we count 0.2 corresponding to the occurrence for B, probability of occurrence for B. So 0.2 plus 0.3, 0.5, 0.5 plus 0.2, 0.7 okay. We are excluding E because it is probability of X less than E minus probability of X less than B okay. Probability of X less than E is 0.7, 0.3 is 0.8 sorry 0.1 plus 0.2, 0.3, 0.3 plus 0.3, 0.6, 0.6 plus 0.2, 0.8 minus probability of X less than B okay. Probability of X less than B as we saw in the previous case is 0.1. So you get 0.8 minus 0.1 which is 0.7. Similarly you can easily show that probability of B less than X less than or equal to E is 0.6 and probability of B less than X less than E is equal to 0.5 okay. So coming back to our original problem statement what is the probability of B less than or equal to X less than or equal to E that was not equal to F of E minus F of B but it is probability of X less than or equal to E minus probability of X less than B okay and that was 0.9 minus 0.1 which was equal to 0.8 okay. So do not try out some formulae based on intuition okay in these kind of situations okay rather than using these formulae I would rather advise you to construct the distribution of probabilities and then do the calculations. Anyway the mathematical formulae are here but from my point of view these formulae need not be memorized but they can be easily implemented okay by just using the diagram. Now we are going to look at something known as moment okay. In physics you might have seen the term moment coming to use rather frequently okay force into distance okay. So here also we have ordinary moments and central moments. So I will just give the definitions here. So the expected value of E of X is given by i is equal to 1 to n X i F of X i this is called as the first ordinary moment. In general expected value of G of X is equal to sigma i is equal to 1 to n G of X i into F of X i okay. Here you had random variable X so you put X i here. Here you have G of X and so you put G of X i here. The function defined by the random variable is implemented on X i inside the summation okay. So if we define G of X as X to the power of k then the expectation of G of X is called as the kth ordinary moment of the random variable X. So the kth ordinary moment is represented by m of k is equal to expected value of X of k. The first ordinary moment is represented by m1 which is equal to mean and that is equal to expected value of X because k is 1. So you have expected value of X and that is equal to mu. So far we have been defining the function G of X as X power k but we can also define the random variable X as a deviation from A where A is a constant value okay. Earlier we had by default used A as 0 but it need not be the case always. So we want to put A as a constant value and we are going to find the kth moment okay and remember k should be an integer value. So G of X is equal to X-A to the power of k. The expectation of G of X in such a case is called as the kth moment of the random variable X about A okay. The kth moment of the random variable X about A. The moments about the mean are defined as mu of k is equal to expected value of X-mu to the power of k. So k can be 0, k can be 1, 2 and so on. Remember k should be an integer okay. So here we are finding the moment about the mean okay. The mean is the center point of the distribution okay. So it is X-mu, expected value of X-mu to the power of k is called as mu k and these are referred to as the moments about the mean for different values of k. It is also termed as the kth central moment of the random variable X. Since we are taking the moment about the mean, we refer to it as the kth central moment of the random variable X. So when you put k is equal to 0, mu 0 is E of X-mu to the power of 0, expected value of 1 is 1. Mu 1 where we put k is equal to 1, so expected value of E of X-mu to the power of 1 is expected value of X-mu is equal to mu-mu is equal to 0 okay. So this is the expected value of X-mu okay that we call as mu 1 which happens to be 0. So mu 0 is 1 and mu 1 is 0 okay. The second central moment is termed as the variance. The central moment is taken with respect to the mean. So k is equal to 2, the expected value of X-mu whole squared is equal to sigma squared. This definition we had seen previously but now this is called as the second central moment or the second moment about the mean. The square root of these variance is called as the standard deviation. So E of X-mu whole squared or mu 2 may be written as m2-mu squared, second moment-mu squared okay, second ordinary moment-mu squared. So you have found the mean variance and the standard deviation. So you have to scale it properly okay and so you want to see what percentage of the mean or what fraction of the mean is the standard deviation and so you write the coefficient of variation Cv as sigma by mu. It provides a dimensionless measure of the relative amount of variability exhibited by the random variable okay. If sigma is very high okay, sigma is 100 okay. You may think that there is a lot of variation, the standard deviation is pretty high but you have to find it with respect to the mean value. If the mean is in the order of 200 okay and the standard deviation is 100, 100 by 200 is pretty high okay but if the mean value mu is in the order of 100 by 10000 okay which is 1 by 100 and that would be 0.01 which is pretty low okay. So you have to compare the standard deviation with respect to the mean value. The third central moment or the moment about the mean mu is defined as skewness okay and that is given by mu3 is equal to expected value of x-mu whole cube okay and the significance of the skewness it illustrates or indicates the asymmetry of the distribution okay. You can understand that when there are several values of x, they may be distributed about a mean value okay and they may not be distributed uniformly with respect to the mean value and so you may have negative deviations and positive deviations and the relative difference between the negative and positive deviations with respect to the mean is given by the third central moment. So third central moment or skewness is a measure of the deviation from the symmetry okay and the dimensionless quantity mu3 divided by sigma cube is known as the coefficient of skewness. If the distribution is perfectly symmetric both these skewness and coefficient of skewness both vanish when the distributions are such that the negative deviations are more than the positive deviations from the mean, the distribution is said to be skewed to the left okay. There is dominance of negative deviations when compared to the positive deviations and so the distribution is said to be skewed to the left and both mu3 and gamma3 take negative values. The fourth central moment is called as the kurtosis and that is defined as mu4 is equal to expected value of x-mu to the power of 4 okay. It is a measure of the flatness or the sharpness of the probability distribution okay. The ratio of the kurtosis to the fourth power of standard deviation is termed as the coefficient of kurtosis okay. A distribution with the high value of kurtosis has a sharp peak. On the other hand the distribution with low value of kurtosis is flat or rounded. When the gamma4 value is less than 3 the distribution is said to be mildly peaked or platycurtic okay. When gamma4 is greater than 3 the distribution is said to be sharply peaked or leptocurtic. These terms you may encounter in standard texts or even in research papers showing distributions and explaining their characteristic features. You may often come across these terms platycurtic and leptocurtic. So depending upon whether the peaks are sharp or the distribution is flat and rounded the appropriate terminology is used. The next topic in our discussion is median for discrete probability distributions. The median xm for a discrete probability distribution is the point within the range of the allowed values of the random variable x such that the cumulative distribution value at xm is exactly 0.5. In mathematical form sigma i equals 1 to m f of xi is equal to the cumulative distribution value at xm should be equal to 0.5. F of xi is the probability mass function. The xm value here is such that the probability of x less than xm is equal to probability of the random variable x greater than xm are both equal to 0.5. This concludes our brief discussion on the discrete probability distribution functions. We will be now moving on to continuous probability distributions. They are also called as the probability density functions.