 Friends, in the last class we have introduced various discrete distributions and towards the end I introduced distributions which arise during a Poisson process. So, I introduced the assumptions of the Poisson process in the following way that we call the events to be happening in a Poisson process if they satisfy the following three assumptions. And for convenience we are considering events which are occurring over the period of time. Of course, we can consider events happening over an area or events happening over any space also, but for convenience of deriving the form of the distribution we will consider the events happening over time. Now, the assumptions let me recollect the assumptions here. The occurrences in disjoint time intervals are independent. The probability of a single occurrence in a small time interval is proportional to the length of the time interval. The probability of more than one occurrence in a small time interval is negligible. And then I formalize this assumptions in the notational form. Let us consider the x t to be the number of occurrences in the interval 0 to t. So, if we consider the timeline starting from time 0, how many occurrences are there in the interval 0 to t that I denote by x t and we use the notation p and t is equal to probability that there are n occurrences in the interval 0 to t. Now, under this assumption the second under this notation the second assumption that is the probability of a single occurrence. So, we can consider p 1 h that means, one occurrence in the interval of length h is proportional to the length of the interval. Now, this is small. So, h I am considering small quantity. So, p 1 h is equal to lambda h plus o h this is a negligible quantity. So, if we write lambda h then this is the exact translation of this, but we can have some approximation here that roughly it is equal to lambda h. And the third assumption is that the probability of more than one occurrence in a small time interval is negligible that means, two occurrences three occurrences and so on that is negligible. So, small o h notation is for that that means, 1 minus p naught h minus p 1 h is o h. So, that is equal to p naught h is equal to 1 minus lambda h plus o h. Now, we will use this to derive the distribution of the number of occurrences in a Poisson process. So, under the assumptions of the Poisson process, the distribution of x t is given by that is p n t is equal to e to the power minus lambda t lambda t to the power n by n factorial for n is equal to 0 1 2 and so on. That means, probability that there will be n occurrences in interval of length t that will be equal to e to the power minus lambda t lambda t to the power n by n factorial. To prove this one we will use the induction, firstly we will prove for n is equal to 0 and 1 and then we will extend it. So, let us consider let us take n is equal to 0 that means, I need to prove that p 0 t is equal to e to the power minus lambda t. Now, to prove this let us set up a differential equation in the following way. Let us consider p 0 t plus h. Now, p 0 t plus h denotes that there is no occurrence in the interval 0 to t plus h. If I consider the timeline here, so 0 t and say this is t plus h. So, we are saying that there is no occurrence in 0 to t plus h. This is equivalent to saying that there is no occurrence in 0 to t and there is no occurrence in t to t plus h. So, this we can write as probability of no occurrence in 0 to t intersection no occurrence in t to t plus h. Now, at this point we will make use of the first assumption of the Poisson process that this is occurrences in disjoint time intervals are independent. So, if the occurrences are independent then this is equal to probability of no occurrence in 0 to t into probability of no occurrence in t to t plus h. In the notational terms you can write as p 0 t into p 0 h, because here the length of the interval is h starting point does not matter. So, this is simply p 0 h. Now, p 0 h from the assumption 3 p 0 h we have written as 1 minus lambda h plus o h. So, we substitute it here 1 minus lambda h plus o h. Now, we simplify. So, p 0 t plus h minus p 0 t that is equal to minus lambda h p 0 t plus o h into p naught t. Now, we can consider division by h and we can take the limit as h tends to 0. So, dividing by h and taking limit as h tends to 0 see here if I divide by h and take limit as h tends to 0 I get the derivative of p 0 t. So, this is equal to p 0 prime t is equal to and on the right hand side if I divide by h it is simply minus lambda p 0 t. Here o h by h will go to 0 as h tends to 0. So, this term will vanish. So, we get simply it is equal to minus lambda p 0 t. Now, this is a first order linear differential equation and this is of a variable separable nature. So, the solution can be obtained almost immediately. The solution of the above first order linear differential equation is p 0 t is equal to some constant times e to the power minus lambda t because we will divide it here and then if you integrate out you will get log of p 0 t is equal to minus lambda t plus some constant and then if you take e to the power of both that side. So, you will get p 0 t is equal to some constant times e to the power minus lambda t. Now, this constant can be determined by using the initial condition. For example, you will have p 0 0 is equal to 1. So, this will give c is equal to 1. So, the solution is p 0 t is equal to e to the power minus lambda t. So, for n is equal to 0 we are supposed to prove p 0 2 is equal to e to the power minus lambda t and that we have again to establish. Now, if we want to use induction then we need to prove for n is equal to 1 and then we will assume for n is equal to k and do it for n is equal to k plus 1. So, for n is equal to 1 we need to consider p 1 t plus h again we will set up a differential equation. Now, this means 1 occurrence in in the interval 0 to t plus h again if we consider the timeline then from 0 to t plus h if there is a single occurrence then that occurrence can be either in 0 to t or the occurrence can be in from t to t plus h. So, this will become probability of 1 occurrence in 0 to t and no occurrence in t to t plus h or no occurrence in 0 to t and 1 occurrence in t to t plus h. Once again you can consider 0 to t and t to t plus h these are disjoint intervals because this is closed here and it is open here and we can use the independence assumption. So, this becomes simply probability of 1 occurrence in 0 to t that is p 1 t probability of no occurrence in t to t plus h is p 0 h then probability of no occurrence in 0 to t that is p 0 t and probability of 1 occurrence in t to t plus h is p 1 h. Now, here we substitute the values. So, this is equal to p 1 t p 0 h is equal to 1 minus lambda h plus o h plus now p 0 t we have already established as e to the power minus lambda t. So, we can write the value here and p 1 h is again lambda h plus o h. To set up the differential equation we take the term p 1 t to the left hand side and divide by h. So, we get p 1 t plus h minus p 1 t divided by h that is equal to minus lambda p 1 t plus lambda e to the power minus lambda t plus o h by h p 1 t plus e to the power minus lambda t. So, again if I take the limit as h tends to 0 we will get the first order differential equation p 1 prime t is equal to minus lambda p 1 t plus lambda e to the power minus lambda t. Again you can see this is a linear differential equation of the first order and the solution can be obtained by the integrating factor method. So, the solution is of the above first order linear ordinary differential equation this is simply p 1 t is equal to lambda t e to the power minus lambda t plus some constant. Because here the integrating factor will come immediately as e to the power lambda t and then you consider on the left hand side that is p 1 t into e to the power lambda t is equal to and then you multiply here e to the power lambda t this becomes 1 then lambda integral of that will be lambda t plus c. So, when you multiply on this side you got lambda t e to the power minus lambda t plus a constant here and again if I use the initial condition p 1 0 should be 0 this means c is equal to 0. So, p 1 t is equal to lambda t e to the power minus lambda t. So, if you check the p and t term for n is equal to 1 we needed to prove that it is equal to lambda t into e to the power minus lambda t. So, that is satisfied here. Now, let us assume that that p and t holds for n less than or equal to k. Now let us consider for n is equal to k plus 1. So, p k plus 1 t plus h that is k plus 1 occurrences in the interval 0 to t plus h. Now again let us consider the timeline from 0 to t plus h and see how this event can be explained. So, we are saying from 0 to t plus h there are k plus 1 occurrences. Again we can say that this is equivalent to saying that all the k plus 1 occurrences are in the interval 0 to t no occurrence in t to t plus h or k occurrences in 0 to t 1 occurrence in t to t plus h or k minus 1 occurrences in 0 to t 2 occurrences here and so on. Ultimately no occurrence in 0 to t and all the k plus 1 occurrences in t to t plus h. Now when we write like this we are having these are all disjoint events. So, probability of all these events can be written as some of the probabilities. Once again the intervals are also disjoint. So, we can use the independence assumption and therefore, by using the previous type of break up like p 0 t p 0 h or p 1 t p 0 h plus p 0 t p 1 h. In a similar way we can write this as p k plus 1 t p 0 h plus p k t p 1 h plus and then I will write use use a summation notation p k minus i t p i plus 1 h for i is equal to 1 to k that is it goes up to p 0 and here it will go up to k plus 1 right. Now this we can write as p k plus 1 t p 0 h is equal to 1 minus lambda h plus o h plus p k t. We have assumed that the statement is true up to k. So, this is equal to e to the power minus lambda t lambda t to the power k by k factorial into p 1 h which is equal to lambda h plus o h plus sigma. Now this p k minus i t also can be written as e to the power minus lambda t lambda t to the power k minus i divided by k minus i factorial. But what are these terms p i plus 1 h for i is equal to 1 to k see p 2 h that is negligible. So, it is o h and so on p 3 h p k plus 1 h this is all o h. Now the stage is ready to set up the differential equation here. So, you get p k plus 1 t plus h minus p k plus 1 t and you consider division by h here. So, that is equal to minus lambda p k plus 1 t and then plus lambda to the power k plus 1 t to the power k e to the power minus lambda t by k factorial plus o h and all other terms are coming divided by h here. That is this will have p k plus 1 t then this term will be coming and this term will be coming. So, all these terms are coming here, but as h tends to 0 this will all become 0. So, we do not have to worry about them. So, taking limit as h tends to 0 we get p k plus 1 prime t that is equal to minus lambda p k plus 1 t plus lambda to the power k plus 1 t to the power k e to the power minus lambda t by k factorial. Then you can see that it is a first order linear differential equation. So, the solution is by using integrating factor method you will get lambda t then e to the power lambda t. So, p k plus 1 lambda t is equal to integral of this into e to the power lambda t. So, this e to the power lambda t and e to the power minus lambda t this will vanish. So, you get t to the power k plus 1 by k plus 1 factorial here plus c and then you bring that term to this side. So, you will get p k plus 1 t is equal to lambda to the power lambda t to the power k plus 1 e to the power minus lambda t by k factorial plus a constant. Sorry, this is k plus 1 factorial here. Again if we use the initial condition p k plus 1 0 is equal to 0 this will give you c is equal to 0. So, p and t holds for all positive integral values of n. So, this is the distribution of the number of occurrences in a Poisson process. Now, the application of this one will depend upon the length of the interval. So, you have a Poisson process the rate lambda is there. So, based on that you can calculate the probabilities of various types of events. For example, how many events occur the probability of no occurrence and so on during a specified time interval. Now, another version of this distribution which is usually available in the text books is where t is not there. That means, we already fix for example, it is a number of errors on a page for example, it is it could be the number of traffic accidents in a year and so on. So, if we are considering a fixed time scale or fixed area etcetera then this t is kept silent in that case the form of the Poisson distribution is written as. So, sometimes we fix the time length area space in a Poisson process. In that case we can write say lambda t is equal to something called say eta and the distribution of x that is the number of occurrences in that time area space. Then that is written as probability x is equal to say n is equal to e to the power minus eta eta to the power n by n factorial for n is equal to 0, 1, 2 and so on. So, this is an alternative way of representing, but here we are assuming that the our frame of reference for time area space etcetera is fixed. Let me just give the development from this. So, we generally write x follows Poisson eta. See earlier we would have written x follows Poisson lambda t because we are considering interval of length t, but here if we make it that frame of reference independent then it is x following Poisson eta. So, now you can consider probability say for example, to the validity of this thing for example, e to the power minus eta eta to the power n by n factorial from n is equal to 0 to infinity. So, that is equal to e to the power minus eta and if you look at the sum here this is nothing, but e to the power eta. So, that is equal to 1. If we want to consider say moments etcetera say mu 1 prime then that is equal to sigma n e to the power minus eta eta to the power n by n factorial. Now, at n is equal to 0 this will vanish. So, we can consider n is equal to 1 to infinity. So, this is e to the power minus eta and eta I can keep out. So, it is becoming simply eta to the power n minus 1 by n minus 1 factorial which is again e to the power eta. So, it is simply eta. So, the mean of the distribution is actually eta. So, if you correlated with the earlier terminology the rate of occurrence that is probability of a single occurrence in a small time interval is proportional to the length of the interval that is the rate of occurrence of the events in a Poisson process. So, actually that is the mean the average number of occurrences will be actually lambda. So, lambda into t in the length of interval t the average number of occurrences will become lambda t that is equal to eta here. Now, we can consider higher order moments also I will demonstrate for second order and then remaining things can be done in a similar way. For example, if we consider expectation of x into x minus 1 then that is equal to n is equal to 2 to infinity and you will have n into n minus 1 e to the power minus eta eta to the power n by n factorial. So, that is equal to e to the power minus eta and then we take eta square outside and you will get eta to the power n minus 2 by n minus 2 factorial for n is equal to 2 to infinity. So, that is equal to simply eta square. So, based on this we can easily calculate mu 2 prime that is equal to expectation of x square that is equal to eta square plus eta and therefore, variance of x that is equal to eta square plus eta minus eta square that is equal to eta. So, you have a strange situation here the mean is eta and the variance is eta in a Poisson distribution mean and variance are equal mean and variance are same. In fact, one can calculate the higher order moments in a similar way we will get actually the third order and fourth order moments easily we can calculate mu 3 prime that is equal to eta plus 3 eta square plus eta q mu 3 turns out to be simply again eta. So, third moment is also the same third central moment if we consider mu 4 prime that is equal to eta plus 7 eta square plus 6 eta q plus eta to the power 4 and the fourth central moment is actually equal to eta plus 3 eta square. Now, since the third central moment is actually positive. So, this distribution is simply positively skewed the distribution is positively skewed which you can easily see by plotting also see what are the probabilities here the first value is e to the power minus eta here. So, based on what is the value of eta it will be some value here, but thereafter if you look at it for example, if eta is less than 1 then the following values will be simply reducing. Even if eta is greater than 1 in the beginning the values will increase for example, it may become like this, but thereafter it will start decreasing because n factorial terms is there in the denominator. So, after certain stage this will start decreasing. So, the distribution is positively skewed if we consider the beta coefficient that is equal to mu 3 by mu 2 to the power 3 by 2 then that is equal to eta divided by eta to the power 3 by 2 that is equal to 1 by eta to the power 1 by 2 which is actually going to 0 as eta tends to infinity. That means, if the eta becomes large the skewness will become less. Similarly, we can consider beta 2 that is mu 4 by mu 2 square minus 3 then that is equal to 3 eta square by eta square minus plus eta minus 3 that is equal to 1 by eta which is positive. So, peakedness is more that means, it is a leptocortic, but again this goes to 0 as eta tends to infinity. That means, if the rate parameter increases or the mean increases then the distribution tends to the normality the normal peak. Now, from the number of occurrences in a Poisson process one may be interested in the time distribution that means, for example, we start observing the process. When will the first occurrence be there or between two occurrences how much time will elapse or between several occurrences or between two occurrences more than one between two time points more than one occurrence may be there. If we want to analyze all these things then we look at continuous distribution that is the distribution of the time here. So, let us consider the distribution of the time taken to observe the first occurrence in a Poisson process with rate lambda. That means, we are starting from time 0 starting from time 0 the first occurrence. So, first occurrence suppose it occurs at the time y. So, what is the distribution of y? Let y denote this time then what is the distribution of y? To derive the distribution of y we can consider like this what is the probability that y is greater than say a given value a small y. Now, this means suppose this is my small y that means, if capital Y is greater than a small y that means, there is no occurrence here. This means no occurrence in the interval 0 to y. So, this is equal to probability of x y is equal to 0 that is p 0 y according to the notation of the Poisson process. So, what was the distribution here? It is e to the power minus lambda y of course, here I have to take y greater than 0 because if y is less than 0 then certainly this is going to be equal to 1. So, if I consider the cumulative distribution function of y that is probability of y less than or equal to small y that is equal to 1 minus probability y greater than y then that is equal to 0 for y less than 0 it is becoming 1 minus e to the power minus lambda y. I may include 0 here and I can put like this. Therefore we can obtain the density function. So, the probability density function of y is a small that is equal to lambda e to the power minus lambda y for y positive it is equal to 0 for y less than or equal to 0. This is known as exponential distribution or negative exponential distribution. So, a negative exponential distribution is nothing but the distribution of the waiting time for the first occurrence in the Poisson process from the time when we start observing the process. So, here the form is obtained when we are assuming the rate to be lambda here. Now, we can look at some elementary properties of the exponential distribution. For example, what is the moment structure of this one? If you look at this one see this is simply integral will be because of the gamma function is simply equal to 1. So, in fact, in fact, in general we can calculate the k th order moment it is equal to y to the power k lambda e to the power minus lambda y du i from 0 to infinity that is equal to lambda gamma k plus 1 divided by lambda to the power k plus 1 that is equal to k factorial divided by lambda to the power k for k equal to 1 2 and so on. That means the mean that is equal to 1 by lambda. Now, the significance of this you can understand if the rate of occurrence in a Poisson process is lambda then the waiting time average waiting time will be 1 by lambda for the first occurrence. We can look at say mu 2 prime that is equal to 2 by lambda square and therefore, variance will be equal to 2 by lambda square minus 1 by lambda square that is 1 by lambda square which is the square. So, if I consider the standard deviation here that will become 1 by lambda. So, in an exponential distribution in exponential distribution mean and standard deviations are same. In a Poisson distribution mean and variance and the third central moment are the same. In a exponential distribution mean and the standard deviation are the same. And of course, one can write down the higher order moments also here. Let us consider say mu 3 prime that is equal to 6 by lambda cube mu 4 prime will become equal to 24 by lambda to the power 4. So, from here we can derive third central moment that is 2 by lambda cube mu 4 is equal to 9 by lambda to the power 4. So, if I consider the measures of skewness and kurtosis that is equal to 2 by lambda cube divided by 1 by lambda cube that is equal to 2 which is always positive. Of course, you can see that this distribution is always positively skewed because at y is equal to 0 it is 0 and thereafter it is decreasing and as y tends to infinity it goes to 0. So, the curve is always positively skewed if I look at beta 2 that is mu 4 by mu 2 square minus 3 that is also 6 that is also positive. So, it is again leptocortic. Another interesting thing is that this is actually free from lambda this beta 1 coefficient, beta 2 coefficient etcetera. So, irrespective of the value of lambda these are this is always positively skewed and always leptocortic. This negative exponential distribution and Poisson distribution are related in a way which is similar to the relationship between binomial and geometric distribution. For example, in a binomial distribution we consider the number of occurrences in a number of successes during a for a fixed number of trials that is suppose n time the Bernoulli trials are conducted how many successes are observed. So, we can also call it the number of occurrences for a fixed number of time in place of trial if you can consider time and what was geometric? Geometric was the number of trials needed for the first success. So, we can consider the time needed for the first success. So, in now you consider the interpretation for Poisson and exponential. The Poisson here the time is fixed how many occurrences are occurring the distribution of that. In exponential we are fixing the that means, we are saying that when will the occurrence that means, we are not fixing the time. So, how much time will be needed for the first occurrence. So, this relationship is simply analogous to the relationship between binomial and the geometric distribution. And therefore, another property which was there in the geometric distribution that is the memory less property that is also true here. Let me introduce that here memory less property. So, let us consider say probability of y greater than say a plus b given y greater than b then it is equal to probability of y greater than a plus b divided by probability of y greater than b. Now, this is equal to e to the power minus lambda a plus b divided by e to the power minus lambda b that is equal to e to the power minus lambda a which is nothing, but probability of y greater than a. So, this means the probability of waiting time being more than a plus b given that already time b has elapsed till no occurrence and no occurrence is there is same as waiting for time more than a. That means, irrespective of the starting point the probability remains the same. Now, many times this exponential distribution is used as the life of a component or life of a system in engineering systems, in engineering design or in the manufacturing process etcetera. Whenever we are having the system we are considering exponential distribution as the life time distribution for that. So, the occurrence will mean the failure of the system. So, now if the system has not failed till a given time and then we consider the probability of failure for another amount of time after that then it is irrespective of the starting point. That means, whether b is 0 or b is 1 or any other number it does not matter. Basically it means that the distributions which are following the systems which follow exponential lives they are more stable in nature. For example, you are tempted to buy used cycle from your senior in the hostel he has already used, but the cycle is working you buy an old calculator. If it is working then if the life time is exponential supposedly then if it is working then again the failure rate will still be the same after some time also. We will introduce some further measure of this I use the terminology failure rate. So, we define something like instantaneous failure rate or hazard rate it is defined like this. If I consider probability of say y greater than say y plus h I am sorry. Suppose the system is working till a particular time t and it fails immediately after that and if I consider the rate here. So, I consider division by h and then take limit as h tends to 0 what it is equal to this is limit as h tends to 0 1 by h. Now, this term becomes simply probability of t less than y less than r equal to t plus h divided by probability of y greater than t that is equal to now the numerator here is in terms of c d f f of t plus h minus f of t divided by h into 1 minus f. Now, we are taking limit as h tends to 0. So, this is becoming the density function that is f divided by 1 minus f y t. So, this is called the instantaneous failure rate or hazard rate of the system the notation for this is h t hazard rate or sometime z t is also used. Now, this quantity 1 minus f y t that is probability of y greater than t this is also called the reliability of the system reliability of the system at time t. That means, what is the probability that the system is functional at a given time. So, this is called reliability. Now, for exponential distribution you see if we consider the plot of e to the power minus lambda t. So, at t equal to 0 it is 0 and it is equal to 1 there after it decreases, but it decreases very slowly. That means, the reliability function of exponential distribution is more stable it does not fall very freely here. And secondly, if I consider the hazard rate for exponential distribution see this numerator is lambda e to the power minus lambda t and the denominator is e to the power minus lambda t that is equal to lambda. Now, this is very interesting this is free from the time. That means, the hazard rate of a or the failure rate of exponential distribution is constant. This is a very interesting phenomena and it is actually following from the memory less property of this distribution. So, hazard rate or failure rate of exponential distribution is constant. Now, as I was mentioning that when we are considering the time for the occurrence in a Poisson process then in place of the first occurrence we may consider several occurrences. So, let us consider let y r denote the time for the r th occurrence in a Poisson process with rate lambda. So, we want the distribution of this let us consider again as before probability of y r greater than y. So, in the interval 0 to y r th occurrence has not occurred. That means, in the interval 0 to y there will be less than or equal to r minus 1 occurrences that means, x y is less than or equal to r minus 1. This x y is the number of occurrences in the interval 0 to y. So, r th occurrence is after this that means, in this interval either r minus 1 or r minus 2 etcetera occurrences can be there. Now, this is nothing, but probability of x y is equal to j, j is equal to 0 to r minus 1 and this is equal to simply e to the power minus lambda y lambda y to the power j divided by j factorial. Of course, this is true for y greater than 0 if you are taking y to be less than or equal to 0 this is equal to 1. So, once again we have the expression for the probability or you consider reliability function of this. So, the CDF can be obtained that is 1 minus this. So, it is 0 for y less than or equal to 0 and it is equal to 1 minus this term here. Now, to obtain the density function we consider the derivative of this. Now, if we want to differentiate this we observe certain thing about it. See here except the first term each term is a product of two terms. Therefore, when we consider the derivative two terms will come from each of them. Now, we write it in a systematic way to observe that it is actually becoming a telescopic sum. So, that all the terms will get cancelled out except one of the terms. So, this will become to obtain the probability density function of y r we differentiate f y r term by term. So, the first term is e to the power minus lambda y here. So, if you differentiate you will get minus lambda e to the power minus lambda y and the second term is lambda y into e to the power minus lambda y. So, if I differentiate lambda y I will get lambda. So, I will get plus lambda e to the power minus lambda y. Now, the second term will give me lambda square here because this will give me minus lambda e to the power power minus lambda y. So, minus lambda square e to the power minus lambda y. Now if I consider the next one here I will have lambda square y square by 2 factorial. So, if I differentiate I will get twice lambda square by 2. So, 2 2 will again cancel. So, this will again give me plus lambda square e to the power minus lambda y. So, you can observe these terms will actually get cancelled out and you will be left with the last term here. So, what would be the last term here? The last term will become equal to minus lambda to the power r e to the power minus lambda y y to the power r minus 1 divided by r factorial. So, that is equal to so, all these terms get cancelled out we are left with lambda to the power r e to the power minus lambda y y to the power r minus 1 by this will become r minus 1 factorial. So, this is we can write as gamma r. So, here this is for y greater than 0. Now here this is also called gamma distribution or Erlang's distribution. Now after this distribution was derived later on it is realized that this distribution is valid for any positive real number r. In the derivation I have made use of the fact that r is a positive integer, but even if r is a positive real this distribution will be valid integral will become 1 because it is actually simply giving as a gamma function if I integrate from 0 to infinity. The pdf is valid for all positive real values of r. Let us look at the moment structure here which is very simple to observe because of the gamma function here. If I look at say mu k prime then it is equal to integral y to the power k lambda to the power r e to the power minus lambda y y to the power r minus 1 by gamma r dy from 0 to infinity. So, that is equal to lambda to the power r. Now this y to the power k plus r minus 1. So, you will get gamma k plus r divided by lambda to the power k plus r divided by gamma r. So, this term is simply becoming equal to gamma k plus r by gamma r 1 by lambda to the power k. This is valid for k equal to 1 2 and so on. Now we can use it for calculation of the mean variance etcetera. So, for example, what is mu 1 prime that is expectation of y r. So, here it will become gamma k r plus 1 by gamma r 1 by lambda that is equal to r by lambda. So, if you remember in the exponential distribution we had the mean as the 1 by lambda it was the average waiting time for the first occurrence. So, average waiting time for the orth occurrence is simply r times the average waiting time for the first occurrence. So, r by lambda because basically what is happening that you are adding the times. So, 1 by lambda plus 1 by lambda r times ok. So, then it is easy to see that you can actually write down mu 2 prime that will become r into r plus 1 by lambda square and therefore, variance of y r that will become equal to r by lambda square. So, which is again r times added up the variance of individual y that is 1 by lambda square added r times. We also look at the moment generating function for the exponential and the gamma distribution. So, let us look at m g f of and another important point you can note here. If I put r is equal to 1 here then I get the exponential distribution here. So, if I consider m g f of gamma that is equal to m y r t is equal to expectation of e to the power t y r that is integral 0 to infinity e to the power t y lambda to the power r e to the power minus lambda y y to the power r minus 1 by gamma r d y. So, that is equal to lambda to the power r now this term we combine here e to the power minus lambda minus t y y to the power r minus 1 d y. So, that is nothing, but lambda to the power r gamma r by gamma r lambda minus t to the power r this is valid for t less than lambda for t greater than lambda this will become divergent. So, this is equal to lambda by lambda minus t to the power r. So, for exponential distribution this will become lambda by lambda minus t. Now, let me also introduce a direct connection between exponential and the gamma the direct connection between geometric and binomial the direct connection between geometric and negative binomial the direct connection between Bernoulli and binomials. See these properties are useful we have not separately discussed several random variables, but it is easy to understand this concept. See let us consider say suppose x 1 x 2 x n they are independent exponential random variables with mean say 1 by lambda. Then y is equal to sigma x i i is equal to 1 to n is a gamma random variable with parameters n and lambda. Similarly, if we are considering if say x 1 x 2 x n are geometric with probability p then y is equal to sigma x i i is equal to 1 to n has negative binomial with n p. If x 1 x 2 x n are Bernoulli p then y is equal to sigma x i i is equal to n 10 is binomial n p. These are actually iterative properties of some distributions. We can also see suppose I say x 1 x 2 x n is gamma r lambda then sigma x i i is equal to 1 to n that will have gamma n r lambda. Gamma is also iterative because this will become that we are waiting for the rth occurrence n time separately. So, some of that will mean that we are waiting for the n rth occurrence in the Poisson process with rate lambda. So, these are relations which are interesting and we are being derived directly in a Poisson process as well as in the Bernoullian trials. In the following lecture I will cover one of the most important distributions in the theory of statistics that is called normal distribution and also we will show its relation with other distribution and also why it is most important or why it is most commonly used distribution. So, these things I will cover in the next lecture we will also spend some time in the problems on these topics.