 proved that the Poissonian distribution comes as a limit of the binomial distribution. So, the probability of obtaining R events whereas, the expected number of events is mu and that probability is given by mu to the power R by R factorial e to the power minus mu. Overall R equal to 0 1 2 etcetera. So, what is the probability that no event has taken place which is like asking in our example of a glass of water, what is the probability that no bacteria has occurred in that glass of water. So, that probability is the non occurrence probability P naught which is when R equal to 0 we get e to the power minus mu probability of non occurrence. This has a many interpretations for example, if you have random dots occurring at some uniform rate and if you have a random interval of some length L, then the average number of dots expected in that length is some average rate into L. So, this is mu, then the probability that no dot has occurred in a length L is given by again P naught equal to e to the power minus mu. Similar you can give many examples, if there is a mean rate of occurrence of some infections then the probability that it has not occurred easy to the power minus mu again. So, one has to define mu appropriately depending on the actual problem that is being considered. So, some important properties of Poisson distribution of course, it is self normalized for example, sigma P R R equal to 0 to infinity is e to the power minus mu stands out. So, it is mu to the power R by R factorial which by definition is e to the power mu. So, it is e to the power minus mu into e to the power mu equal to 1. The average number R bar easily you can show by summing up R P R 0 to infinity this would be mu as it should be because that was the definition of mu to begin with. So, the average number of events occurring is mu itself. You can go further and show that the variance around this average number defined as R square bar minus R bar square that also turns out to be mu for the case of Poisson distribution. So, it is one important distribution where variance equal to mean or in other words standard deviation sigma therefore, will be square root of mu. This properties used quite often in doing counting statistics for example, if you obtain some mean counts it is assumed that the variance of that with respect to that mean counts is given by mu itself. Similarly, we can construct a generating function for Poisson distribution the generating function since it is discrete we will use the Z variable that will be given by by definition Z to the power R P R R equal to 0 to infinity which will be e to the power mu will stand out and it will be Z mu to the power R by R factorial R equal to 0 to infinity and it will be e to the power Z mu minus mu or e to the power mu into Z minus 1. So, very elegant expression it is very easy to obtain derivatives with respect to Z because it is just an exponential function and the properties that we derived just before can be easily derived from this generating function. This will come in handy when we actually discuss some stochastic problems in future. Poisson distribution when considered in terms of discrete quantities is of course, a probability mass function because it is defined on discrete numbers on the space of discrete numbers. However, interestingly if we consider mu as a continuous variable the Poisson distribution assumes a different significance. To understand this aspect it is better to take up examples in time examples which are taking place in time. For example, supposing you are waiting at a bus stop on a countryside road assume that buses arrive randomly and there is an average rate of arrival mu per second or maybe mu per hour some rate per unit time from rate defined as number per unit time. So, you have let us say 1 per hour. Now, you can ask a question what is the probability that no bus has arrived even after waiting for 2 hours or for that matter what is the distribution of arrival times of buses. The same Poisson distribution now becomes a distribution not in R variable, but in the new variable and we illustrate these examples below. Let us take the example of waiting at the bus stop. You can also consider it as an example say for say waiting to catch a fish on a quiet Sunday noon in a lake. So, supposing there one has a time axis defined by as follows defined as follows shown in the figure below. Then you can ask a question what is the probability that no event has taken place in time between 0 and t. So, let us postulate that there is an average rate lambda which is average rate of occurrence of an event average rate of event. So, in the bus example it is a average number of buses arriving per unit time. In the case of fishing it is the average rate of catching fish time rate. Now, if let us say p naught t is the probability that no vehicle has arrived for a time t. This an expression for this can be derived by first principles like this. I divide my time interval total time interval into equally space large number of intervals each of width delta t. Then there will be n such intervals which is t by delta t. My interval delta t is quite small such that the mean rate of mean probability of arrival of a bus or probability of occurrence of one event in the time delta t is given by lambda delta t. This is the probability of occurrence of one event in time delta t. It is in fact, the probability of occurrence of at the most one event. My delta t is so small that no more than one event is expected to occur. In which case the 1 minus lambda delta t will be the probability of non occurrence in time delta t. So, there are n such delta t intervals and if no vehicle has come between time 0 to t or no event has occurred between 0 to t which means that it has not occurred in the first interval it has not occurred in the second it has not occurred in the all the n intervals. So, the probability p 0 that events have not occurred in time t is equal to 1 minus lambda delta t to the power n. Now, taking the limits we have p naught t equal to 1 minus lambda delta t is a t by n to the power n and since delta t was arbitrary we can reduce it as small as small as small as we please. So, we can calculate then n tends to infinity of this quantity 1 minus lambda t by n to the power n is e to the power minus lambda t. So, we once again arrive at the non occurrence probability as e to the power minus lambda t by a continuous time argument. Now, we ask a question what is the probability that no vehicle has come or for that matter no event has taken place up to time t, but one event took place between times t and t plus delta t. This we call as the arrival time distribution distribution and we denote it by a lower case p that is for distribution for one event we denote it by p subscript 1 t. The meaning of p 1 t is that it is the probability of an event occurring for the first time between t and t plus d t that is p 1 t delta t p 1 t d t equal to the probability that no event occurred up to time t which we know is e to the power minus lambda t and one event occurred soon after in a time period of delta t which we know is probability lambda delta t. Now, we can cancel delta t on both sides and then we get p 1 t equal to lambda e to the power minus lambda t. This is the now interpretation as the waiting time or the arrival time distribution function for one event to occur. We can continue with this argument and develop a general expression for the waiting time distribution of k plus 1 th particle which is the probability that k particles have arrived up to time t and k plus 1 th has arrived between t and t plus delta t. We can continue the argument as follows. So, we have our timeline between 0 to t and we ask a question what is the probability that k plus 1 particles have arrived in this interval. Let us first ask the question the probability of k particles arriving in this interval and that is possible in various ways. For example, supposing we divide an interval to a time t and another interval between time t and capital T of a width t minus t then the probability of k particle probability of let us say k plus 1 particle in t can occur as a result of several possibilities 1 k particles have occurred let us say up to t and one particle occurred between t and t plus delta t with the probability lambda delta t. So, it is a probability that k particles occurred up to t and the k plus 1 th occurred between t and t plus delta t and a probability p naught that no further arrival occurred between small t and capital T. So, since t is now arbitrary t could have been any time between 0 to t and using the r options that is the probability of occurrence of k plus 1 particle in t by various paths would be an integral of 0 to t the probabilities of occurrence of k particle in time t and the probability that no particle occurred in the remaining time t minus t and of course, 1 particle had occurred between t and t plus dt. So, and for all possibilities between 0 and t. So, we have an integral equation type of a formulation by a simple logic of arrival probabilities by treating time as the continuous variable k as a parameter. So, this is the difference in the binomial to Poisson when we went we worked with the discrete quantities and now we are working with the continuous quantities. However, to solve this equation one needs some tools and we can identify this as actually a convolution equation the convolution of p k t and p naught e minus t. So, we can use the concept of Laplace transforms which we discussed which says that the Laplace transform of the convolution equal to the product of the Laplace transform of individual quantities. So, we have p k plus 1 tilde s should be equal to p k tilde s into p naught tilde s where we have used the definition Laplace transform of any function f t is 0 to infinity e to the power minus s t f t dt. We have already calculated p naught the probability of non arrival it is and that is p naught t we have already seen it is exponential of minus lambda t. Hence, it is Laplace transform p naught tilde s is going to the Laplace transform of e to the power minus lambda t and that is easy to show that it is e to the power minus s t. So, s plus lambda t dt which is 1 by s plus lambda. Hence, we have p k k plus 1 s equal to lambda by lambda plus s p k tilde of s for all k. For example, if I iterate with the k equal to 0 we will get p 1 s will be lambda by lambda plus s whole square. Similarly, we can continue which immediately implies that the probability that one particle has arrived for the first time between time t and t plus dt that distribution of those time is given by lambda t e to the power minus lambda t. This comes from the look up table if you see the Laplace inversion of the quantity 1 by lambda plus s whole square will be basically something like t e to the power minus t. Similarly, we can continue and show that the Laplace transform that we wanted for the k events is going to be lambda to the power k by lambda plus s to the power k plus 1. This is done by iterating it further put k equal to 2, 3 and then you will obtain this. And again the Laplace inversion of this is also known from the look up table and we will get the probability of time interval t containing exactly k events as p k t equal to lambda t to the power k e to the power minus lambda t divided by k factorial. This is the same as the Poisson distribution we derived earlier, but this approach allows us to obtain waiting time distribution function for k plus 1 events. To obtain this let us proceed as follows. We denote p lower case p k plus 1 t as the waiting time distribution or waiting time distribution for k plus 1 events. Now, like earlier which means p k plus 1 t d t is the probability that k plus 1th event occurred for the first time between times that is k plus 1th event occurred for the first time between t and t plus d t. From the arguments that we made earlier this is nothing but the probability that exactly k events occurred in time t that is p k plus 1 t d t equal to p capital P here the probability that k events occurred exactly in time t and one event occurred in time between t and t plus delta t and that probability is lambda t. So, we have now result that the waiting time distribution for k plus 1th event occurred is given by lambda p k t and we have already shown p k t is nothing but lambda t to the power k divided by k factorial e to the power minus lambda t. It is easy to show that the waiting time distribution is normalized with respect to t that is because you can easily see that integral 0 to infinity p k plus 1 t d t which is equal to 0 to infinity lambda t to the power k e to the power minus lambda t divided by k factorial into lambda d t which will actually come out to be because the numerator integral is just going to be k factorial only and denominator already has a k factorial and hence this will be 1 and this is independent of k which is as expected because if you wait sufficiently long any number of events cannot all events will occur. So, this is normalization of probability normalization of p k plus 1. Another important property of the Poisson distribution is its limit as its mean value becomes large that is Poisson to Gaussian. If you plot Poisson distribution for various values of r considering it as a discrete distribution then for r equal to 0 it is just an exponential e to the power minus mu then for different r's it has distributions where the mean value continuously moves towards the right. The mode is basically equivalent to occurs at r equal to mu. So, as mu increases one obtains this curves which constantly shift towards the right. One can prove that in the limit when mu is large say much greater than 1 then this distribution that we see plotted around the value that is r closer to the value of mu attains a Gaussian distribution and that distribution is given by again P r we call it as G as 1 by root 2 pi mu e to the power r minus mu whole square divided by 2 mu Gaussian. One can prove this by asymptotic approximations for r factorial using Stirling's approximation. To do that we first take log of the Poisson distribution then we will have log P r equal to r ln mu minus mu minus ln r factorial. You have taken log on both sides. Here we now use the for large r because we are working in the limit when mu is going to be large and looking at the distribution around mu. So, r also is going to be large. In that limit we have the Stirling's approximation which says it is r ln r minus r plus ln of root 2 pi r for r tends to infinity. So, we now therefore expand around mu by choosing a new variable xi which shows the shift between r and mu scaled by root mu which is actually the standard deviation. So, basically we know that my distribution as mu tends to infinity I am looking at a standard deviation which is of the order of root mu and I am looking at distribution in the neighborhood of r equal to mu. So, the xi is that closeness new variable close to the peak. So, if we re-express the entire transformation and the asymptotic in terms of the variable xi we will have ln P r will be r log r r is nothing, but xi root mu plus mu ln of which will be you can write for example, ln of r and ln of mu here it should have been ln of mu. So, r ln of mu and r ln of r. So, that is going to be 1 plus xi by root mu here and then it will be plus r minus mu and of course, ln of root 2 pi r will remain. If we now expand this function using log x equal to x minus x square this will be xi by root mu minus xi square by 2 mu. Then we multiply by this several terms will cancel and this will actually go over to minus xi square by 2 minus ln of root 2 pi r here now r is very close to mu in the limit xi will be 0 in the limit. So, you will have this is log 2 pi mu. So, if we exponentiate it once again this leads to the desired result P r goes to a Gaussian distribution which is 1 by root 2 pi mu e to the power minus xi square by 2 which is r minus mu whole square by 2 mu. So, this proves the asymptotic transition of Poisson distribution to Gaussian distribution. So, students we started with binomial distribution discrete arrived at Poisson distribution showed that it is not only discrete comes as a limit, but also as a legitimate standing on its own in several problems. And it can also be interpreted as a distribution in the continuous variable several problems using weight in time distributions. And further when we do asymptotics of the Poisson distribution it also transitions to Gaussian distribution. So, Gaussian distribution appears to be the natural destiny natural sort of asymptotic end point of processes such as dichotomous processes or discrete processes as well as continuous processes. This has now naturally brought us to certain important concepts and theorems in statistics which we will discuss in the subsequent lectures. Thank you.