 Now, we looked at some discrete probability distributions last time, in particular we looked at some properties of the binomial distribution and then I pointed out that the binomial distribution went over into the Poisson distribution in a certain limit well defined limit. We also looked at simple physical example of the binomial distribution which is density fluctuations number density fluctuations number fluctuations in an ideal gas classical ideal gas okay. If you recall the formula we had for the binomial distribution was P of n was n n n is n where n runs from 0 up to capital N here that was a binomial distribution and this is the distribution of a set of Bernoulli trials n Bernoulli trials capital N Bernoulli trials each with the probability little p of success and then I pointed out that if you took this limit in which by the way the average value of n was equal to n times p and if you took a limit in which n tends to infinity and p tends to 0 such that this number is finite at some value mu then the probability distribution tends in this limit n tends to infinity p tends to 0 n p tends to some value mu some constant value mu this tends to the Poisson distribution which is e to the minus mu mu to the n over n factorial and the mean value remained of course at mu and this distribution now n has the sample space 0 1 all the way up all the way to infinity alright. So we needed a simple example of this distribution and many many about Poisson distributions abound in nature we will come across many many Poisson distributions Poisson processes in fact as we go along but for the moment let me point out that if you did this elementary nuclear physics example of taking a large sample of a radioactive sample and asking what is the average number of decays that take place nuclei that decay in some given in a at till some in some given interval of time you discover it increases linearly with the time with some mean decay rate lambda. So you discover that the number is distributed exactly as in a Poisson distribution with an average mu which is equal to lambda t where lambda is the is a mean rate of decays so that is a classic example of a Poisson process. I also said I give you an example based on a number statistics exactly as in the case of the classical ideal gas and this is afforded not by black body radiation which had a geometric number distribution of photons if you recall but rather by coherent light. So ideal single mode that is one single frequency one given wave number wave vector one state of polarization if you look at ideal single mode laser light it turns out that the number of photons in this radiation field is described by a Poisson distribution once again precisely a Poisson distribution for that let me go back a couple of steps and do a little bit of quantum mechanics to tell you where this emerges where this comes from at least in the kind of non rigorous way and this has to do with the idea of coherent states in quantum optics such a state is described by a state which is expandable in a basis set which comprises photon specific photon number states and if I call the state n a state of the radiation field in which there are n photons exactly n photons of a given frequency and a given state of polarization then the so-called coherent state is built up of a superposition of these states and it looks like this it looks like a summation from n equal to 0 to infinity e to the minus half modulus alpha squared I will explain what these symbols mean alpha to the power n over square root of n factorial where alpha is any complex number. So you give me an arbitrary complex number alpha of finite modulus and I construct this superposition of number states of specific numbers and this state here is denoted by symbol alpha label by this complex number alpha as you can see when you change alpha you get a different state and so on and it is called a coherent state. Now there are technical reasons as to why it is called a coherent state what is so specific what is so special about this state and so on you will not get into that at the moment here because it is not the focus is not on quantum optics but I merely want to point out that in this coherent state it is not very difficult to show that you get precisely a Poisson distribution for the number because after all if you want to ask what is the probability amplitude that in this state alpha you have precisely some fixed integer number say r photons that is given by this quantity here or an alpha and I presume those of you who are familiar with quantum mechanics will recognize this as a scalar product it is a complex number in general and it is a probability amplitude that in the state alpha you have exactly r photons okay and when you do that and you use the fact that so put that in here and this is equal to e to the minus mod half alpha squared summation n equal to 0 to infinity alpha to the n over root n factorial and then you have an r with an n but you see these number states are all orthonormal to each other they form an orthogonal basis the orthogonal generality of the basis implies that orthonormality of the basis implies that this quantity is a chronicle delta it is equal to 0 if n equal to r and it is equal to unity if n is 0 if n is not equal to r and it is unity if n equal to r just a chronicle delta which helps you immediately to write down what the sum is because the sum fires only when n is equal to r and therefore this becomes equal to e to the minus half mod alpha squared alpha to the power r over square root of r factorial that is the probability amplitude and the rule of quantum mechanics is if you give me the probability amplitude for something to happen the actual probability is not a complex number it is the modular squared of this complex amplitude and therefore the probability that in this state you have r photons is given by this quantity r alpha whole squared and that is easy to figure out because you get another such factor here so it is e to the minus mod alpha whole squared and then you have since you want the complex conjugate is alpha on the left and r on the right is the complex conjugate of this number here this is a real number that is a real number this is a complex number so you have to do alpha star to the power r that makes it mod alpha squared to the power r divided by r factorial because there are two of these factors the square root goes away and of course you immediately recognize that this is a Poisson distribution with mean value equal to mod alpha squared right so the photon number distribution in ideal single mode lays a light of a given state of polarization and frequency is a Poisson distribution very drastically different from the geometric distribution that we had earlier because we know what the variance of a Poisson distribution is we know that if this is the distribution we know that the mean value n equal to mu and we know that the variance of n is also equal to mu the mean is equal to the variance that is fairly easy to derive because we know the generating function for this distribution and therefore it is trivial to compute that show that the variance is also equal to the same is the same as the mean that is one of the properties fundamental properties of a Poisson distribution now compare this with so so what does it mean if I have a number distribution that is a photon number distribution that is Poisson so if you have ideal single mode coherent radiation the P of n is equal to e to the minus some average value mu mu to the power n over n factorial on the other hand for thermal radiation the P of n was a geometric distribution very different from a Poisson and we saw that this thing here is 1 over 1 plus mu and then there was a mu over 1 plus mu to the power n so this was Poisson this is geometric the variance in this case equal to mu itself the variance in this case here so this immediately implies the variance is mu the standard deviation is square root of mu the standard deviation divided by the mean is 1 over square root of mu and therefore delta n over n equal to 1 over square root of mu that is in that case on the other hand here the mean value equal to mu and the variance not difficult to show this is mu times 1 plus mu that is trivial to show from this expression here and this will immediately imply delta n over n equal to well the mu there is a square root out here and then this is going to give you a square root inside so this is equal to square root of 1 plus 1 over mu and as mu increases when you have a large number of photons this scatter becomes extremely small on the other hand this scatter is always bigger than 1 this ratio is always bigger than 1 showing that in thermal radiation you have a huge scatter about the mean whereas relative fluctuation is very very small in coherent radiation so one of the defining properties of this coherent radiation now these results are very easily derived if you use the generating function in each case we know the generating function for this we know the function for this is a trivial to do this so I urge you to do this as an exercise so in some sense the most coherent radiation you can have is ideal single mode laser light and the least coherent one you could have is blackbody radiation thermal and what is happening is that you have distribution which goes all the way from here to here it would be nice to have a family of probability distributions which interpolate between these two extremes okay and we are going to see very shortly that there is such a family it is called the negative binomial distribution which is going to interpolate between the geometric distribution on the one hand and the Poisson distribution at the other extreme here in fact one can use that family of distributions to analyze photo photon counting statistics and see how much of thermal light is at mixed is mixed with the with the coherent radiation in an actual experiment but before we do that let me take up another question and that is the following what happens if you have a sum of two Poisson distributions if you have a sum of two random variables each of which is Poisson distributed okay with different means in general what would happen let us suppose that you have two species of radioactive nuclei and you are trying to count what is the total number of decays in a given time the rate at which one of them decays is lambda 1 and the other one is lambda 2 so each of them as I said is a Poisson distribution at any given instant of time and the question is the question you can reasonably ask is what is the actual distribution of the sum of these two processes of these two random variables okay so let us suppose that you have one random variable n which is distributed p1 of n in a Poisson manner you have to the n over n factorial and you have another m which is distributed also in a Poisson distribution but with a different new and we would like to know what is the probability distribution of the random variable s defined as m plus n and I would like to know what is p of s now what would you say it is how would you approach this problem these are independent of each other completely independent of each other so fundamental of basic first principles way of writing down the probability distribution of s is to say that this is equal to you try to make up s in all possible ways by summing m and n and first of all what is the sample space of s so what is the sample space of s also 0 to infinity also 0 to infinity so that way there is no change in the sample space so what you have to do is to say let us suppose the first variable the distributed n has a distribution p1 and then a p2 of m and you sum over all possible values 0 to infinity summation n equal to 0 to infinity subject to the constraint that m plus n is equal to s and that should by definition give you the probability distribution of s remember that m and n are dummy indices here they summed over so the answer is a function of s and it fires only when m plus n is equal to s for any given value of s so that is the definition and what we got to do is to put this in here and then do the summation but you cannot independently sum over m and n because there is a constraint here so you should use that constraint you get rid of one of the sums but it will constrain the others for instance n neither m nor n can exceed s because they are the sum of two non-negative integers s is the sum of two non-negative integers for any given value of s m can at best go up to s and so can n at best go up to s so while you can do the summation in principle the sum will get constrained here and you got to be a little careful in doing this on the other hand you could do the following you could say alright let us suppose f of z let us call the generating functions for this and this f1 and f2 of z and then we could do the following we could say what is the generating function of the sum that is equal to this by definition z to the power s but this p of s I put this in here and you can see that this is equal to a summation over n equal to 0 to infinity summation n equal to 0 to infinity and then I plug this in I multiply both sides by z to the power s and sum over all possible values of s okay so there is a summation s equal to 0 to infinity this guy and then you have p1 that is e to the minus mu plus mu and then what so remember that yes over m factorial n factorial here and then you have e to the power you have z to the power so mu to the power this is a better way of doing this and this is got to be a much better way of doing this mu to the power n mu to the power m z to the power s but subject to the constraint m plus n equal to s so all I have to do is to put that in here this becomes z to the power n mu to the power m sorry z to the power okay and that is it so what is this equal to now I have used the constraint here I finished off the sum over s because I have used that constraint to replace s by n plus m okay and what does this become mu n mu z to the power n mu z to the power m and each of the sums can now be done completely there are no further constraints on it therefore this gives me e to the power mu times z minus 1 plus mu times z minus 1 and all we need to do is to take out the coefficient of any power that we want to find p of s we need the coefficient of z to the power s in the power series expansion of this quantity in a power series in s okay and therefore this immediately follows that p of s equal to e to the power minus mu plus mu mu plus mu to the power s over s and what is that distribution it is a Poisson distribution with the means added up therefore the variance is also added up variance of s is the same as the variance of this plus that incidentally one can show I have been sure that if you have two random variables which are independent of each other statistically independent of each other their variances must add up their means must definitely add up that is a trivial thing to show but the variance is also add up why is that why do I say that the variance is add up because the only place they need would not add up if you try to find the mean square value of m plus n for example the sum of two random variables you would have the mean value of m squared the mean value of n squared and then the mean value of m times n right but that factors that factors because these are independent variables since it factors it cancels out when you subtract the mean whole square immediately so it is trivial to see that the variances have this additive property which the mean square value of a random variable does not so if a random variable is a sum of two or more random variables which are independent of each other then the mean value is just the sum of the means of the individual components and the variance also is a sum of the variances of the individual components a property which is not shared by this mean square alone or any higher moment so this invariance is very crucial and we will see that it is actually a reflection of property of what are called cumulants which also add up in additivity of cumulants is very very fundamental notion we will come back to that okay. So our first lesson if you have two Poisson random variables which are independent the sum is also a Poisson variable added up in this fashion I urge you to show that in this instance for example a slightly more general property suppose a and b are any real constants positive say then or negative it does not matter real constants consider the random variable let me use a symbol for this random variable let us call it u this is equal to a times n plus b times m this random variable does not have necessarily integer values in sample space because it is m and n are integer valued but a and b are any constants real constants then what is the mean value of mu u what do you think is the variance of its a squared because you are not finding the variance so it is a squared variance of m plus b squared variance of the other way now you know this property that the sum of two Poisson variables again a Poisson random variable follows very simply from the fact that the generating functions essentially multiply and the generating functions multiply and because they are exponentials because of this property here you get an additive exponent out here so you can see clearly that this is going to happen anyway but now I could ask a slightly more complicated question what about the difference of two Poisson random variables what do you think is happened going to happen there so let us go back and ask what happens if I consider not the sum but the difference let us call it r equal to m minus n what kind of distribution would this have what would this do what is the sample space of r minus infinity to infinity now not 0 to infinity all integers all integers are allowed right and now we try to find out what is the generating function of this thing here well we do exactly the same as we did here except we now have to write r equal to minus infinity to infinity p of r r to the power s and it is the same as this except that you are going to have to write let us let me put the symbols back here p 1 of n p 2 of m and then we multiplied by sorry what did I do here z to the power r z to the power r and if I plug that in here I get a z to there is going to be a delta function of m minus n comma r and I am confusing n and m all the time so this is p 1 of n that is correct p 2 of m let us call n minus m so that I do not confuse between 1 and 2 I call this 1 and this 2 and then I have a delta function of n minus m comma r and if I multiply by z to the r I replace that by z to the power n minus m so this becomes z to the power n z to the power minus m so this guy is equal to f 1 of z and f 2 of 1 over z because this power series with z to the power m is going to give me the generating function f 1 but now it is 1 over z to the power m and what is that equal to for a Poisson this becomes equal to e to the power minus mu sorry mu times z minus 1 e to the power mu times 1 over z minus 1 which is equal to e to the minus mu plus mu and then e to the power mu z minus plus mu over z and what we seek is the coefficient of z to the power r in the expansion of this quantity in powers of z but then all possible powers positive as well as negative powers are going to exist and it is not so easy to first you expand this and then you expand that and of course for a given power z to the power r an infinite number of terms are going to contribute because there is an infinite series in positive powers here and negative powers out here so we need a little formula what we need is something called a generating function for a special function so we need an identity which is the following e to the power one half t times some xi plus 1 over xi when expanded in powers of xi positive as well as negative this guy here is a summation r equal to minus infinity to infinity xi to the power r and it is multiplied by t this here is the modified Bessel function of the first kind and order r it is called nice interesting properties I will come we will come back when we study random box we will come back to this special function I tell you all about it but it satisfies a certain second order differential equation called the Bessel's modified the modified Bessel equation it is a nice function it is what is called an entire function in the sense that it has no singularities whatsoever for any finite value of t as a complex variable that was a nice entire function it can be written as a power series in t which converges absolutely for all finite values of mod t converges even faster than the exponential does the power series in t but whatever it is we can write down what the answer is based on this from this guy here because this here we can cast it in that form what the trick is how should I cast this in that form I need a xi and a 1 over xi but I got a mu z and a new over z so what should I do I want to make it some variable xi and its reciprocal take out mu that does not help root mu take out a square root of mu yeah so take out a square root of mu nu and then you have square root of mu over mu z plus square root of mu over mu 1 over z outside the square root you have this but you need a 2 out there so you put that 2 in by hand write it in this form and that will immediately tell us the p of r in this case is equal to e to the power mu plus mu square root of mu over nu to the power r because I want the coefficient of z to the power r in the expansion so that is got this guy sitting with it and then i r of twice square root of mu this factor here depending on whether mu is bigger than nu or nu is bigger than mu is going to be dominated by the positive or negative r one of the other so it is just a power factor there the Bessel function itself i r has an interesting property it says i r i sub r for any positive integer r is equal to i sub minus r it is got this symmetry property so the bias is entirely in this i r of anything of xi i r of t so it is not at all not at all a Poisson distribution some more complicated distribution than that by the way this guy has got a name this thing here is called a skelem if you like it is the generalization of the Poisson distribution to all integers to a sample space which is both positive as well as negative integers and 0 so this is the skelem distribution here and the difference of two Poisson variables has a skelem distribution okay. Now we already know the generating function this is the generating function for this distribution so from here it is possible to find all the factorial moments fairly straight forwardly without any difficulty for example the average value of r will turn out to be mu minus nu that follows immediately if you differentiate this once with respect to z and set z equal to 1 you get this immediately what about the variance what do you think will happen to the variance well remember we looked at a distribution of a n plus b m and we said a and b are just real numbers it does not matter whether they are positive or negative so you could set one of them equal to one and they are equal to minus one and of course immediately it follows that this guy is equal to mu plus mu so when you have the difference of two Poisson variables the mean of course is the difference but the variance is still add up in fact we will see a little later that the cumulants this is called the first cumulant this is called the second cumulant and then there are higher cumulants and so on we will see that the kth cumulant of the difference of two Poisson variables will be the first cumulant mean plus minus one to the power k times a second so every other guy will going to be this and every odd cumulant is going to be this in this case for the sum of course all the cumulants are going to be exactly the same we will see that let me talk about cumulants here okay so this is an instance of what happens to the difference of the two just one final remark about a physical example once again of such a situation we will see extensively that the solution to the simple random walk problem in continuous time is precisely a scalar distribution so if you did the following if you took a coin and you had an infinite linear lattice started at some origin and you tossed a coin and went to the right with some probability little p and to the left with probability little q depending on whether you got heads or tails and you did this at random instance of time with some mean rate according to a Poisson process then the probability of landing up at the point r on this lattice on the lattice point r positive or negative it does not matter it is precisely this scalar distribution okay so this will be the difference of two Poisson processes in other words steps to the right and steps to the left would be regarded as two Poisson processes and then you want the difference of the two and we will see all the various places at which in which this random walk problem various problem map on to the solution to this random walk problem we will see how that comes about okay we should probably take a little gap here and then any questions yes if alpha and beta are not integers that is not true anymore so his question is what happens if I did that which is precisely what we had written down so let us call the variable u equal to a times n plus b times m a b real constants and then ask what is the probability distribution of u itself okay this is not Poisson it is not or anything like that because the sample space is no longer integers right it is any a times integer plus b times another integer so it is again a set of points but the question is what is the probability distribution in this case one way to do this is to go right back to what the the ab initio definition of this quantity and instead of writing a chronicle delta you write a direct delta function of u minus a n plus b m okay and then write a representation for that direct delta function perhaps as an exponential which factors n and m and find the generating function of this u in terms of the generating functions of n and m exercise for the reader to do this okay so that is a nice interesting exercise to find out what is the probability distribution so do the following write this thing in terms of so use use a delta function of u a n plus b m minus u put this formally inside the summation for n and m put this guy formally but you want to factor out this is additive in n and m you want to factor it out and make it into some product of something or the other when you got a sum and you want to convert it to a product what will you do you exponentiate you exponentiate right so you would like to find the representation for this delta function which converts it into an exponent okay and use a thing like delta of any real variable x is integral p to the i k x you write a Fourier representation for it so put that in here you are going to have one more integration to do but that is not very difficult to do and then you will end up with the generating function okay so that is when interesting exercise so recall the binomial distribution which was p of n binomial distribution was p of n equal to n n p to the power n q to the power n minus n but the negative binomial distribution also has two parameters one of which is a positive integer capital N and the other is a little p exactly as in this case but this distribution p of n is actually equal to n minus n plus n plus n plus n minus 1 and then n here p to the power n and q to the power little n in this case where n equal to any positive integer and the sample space here is n equal to 0 1 to all the way up to infinity unlike the case of the binomial where 0 less than equal to n here it goes all the way up to infinity it is immediately clear that if you put capital N equal to 1 for example this is 1 that cancels there and this symbol becomes 1 then of course it becomes p times q to the power n which was precisely the geometric distribution so it is clear that n equal to 1 corresponds to the geometric distribution the question is what is it in general what is it okay general capital N what does it look like well the thing to do is again to find the generating function so f of z is summation p of n z to the power n and this time n runs from 0 to infinity and I plug in this in there so this is equal to little p to the power n n equal to 0 to infinity n plus n minus 1 little n and then you have q z to the power n that is an infinite series on this side however it is not very hard to show that this is equal to p to the power n over 1 minus q z to the power n it is precisely the binomial expansion of 1 minus q z to the power minus n and that is the reason for calling it the negative binomial distribution as you know when the index is not a positive integer this binomial series is an infinite series and it simplifies if you write it in terms of gamma functions and so on it simplifies to this guy here okay so once we have this it is easy to see what is the mean value of n equal to if I call this mu say definition of the mean value is mu this is equal to the derivative of this with z set equal to 1 if I differentiate it I am going to get out here p to the n and then there is 1 minus q to the n plus 1 so there is going to be a p in the denominator and an n which comes from there so it is equal to n and q by p because this is going to become n plus 1 when I differentiate it so that is a p to the capital N plus 1 and it cancels against p to the n and a q comes out on differentiation the minus sign goes away so it is this that is the mean value now of course we know already that for the geometric distribution when you put this equal to that 1 it is q over p you know that already let us rewrite this a little bit and let us write this as mu equal to n times 1 minus p over p which means that mu plus n p equal to n or p equal to n over n plus mu and q of course is mu over n plus mu 1 minus p is mu over n plus mu then if I put that into the distribution into this guy p of n equal to n plus n minus 1 factorial divided by n factorial n minus 1 factorial that is the job combinatorial part and then a p to the n that is n to the power n over n plus mu to the power n that is p to the capital N q to the power little n that is mu to the power n over n plus mu that is what this two parameter distribution looks like in terms of n and mu instead of p and q I got rid of p and n I got rid of p and use mu instead now look at what happens when capital N goes to infinity keeping mu finite at some value what does it do so p of n this is going to give me an n to the power n plus n to the power n plus n dominant term by sterling's approximation this is going to give me an n to the power capital N this guy here is going to look like 1 over 1 plus mu over n to the power n if I divide through by this n to the power n it is just this fellow here and then you have a mu to the power n and you got an n to the power capital N n to the power capital N and n to the power little n this gave you n to the power little n plus capital N so this leading term cancels against this and that and leaves you with just this and you cannot forget this so that is sitting there and the limit of this of course is e to the power mu so that goes up on top and you get e to the minus which is precisely a Poisson distribution so we have this family of negative binomial distributions which tends n equal to 1 geometric and n tends to infinity so that is the great advantage of this distribution and it is got a very nice very simple looking generating function we wrote it down just an algebraic function here so we can write down all its moments and so on okay. As I said one uses this in the study of photo electron photo photon counting statistics for admixtures of thermal light and coherent light among other things many other places in population dynamics etc where the negative binomial distribution is used very very frequently just one final remark and then we will take it up from here we have talked about the generating function throughout okay and while it has advantages it also has disadvantages so our definition of the generating function was equal was that if you had a probability distribution p of n of a random variable n then I defined the generating function as this summed over all allowed values of n that was my definition of the generating function on the other hand I could also ask the disadvantage of this was that if I want various moments of this distribution with various powers of n here I have to go on differentiating this but if I differentiate this k times then I actually end up with plus 1 this average was dk f of z over dz to the power k at z equal to 1 so I actually find the factorial moments rather than the moments themselves okay to overcome that disadvantage its advantages to define what is called a moment generating and that is defined as m of u I will use a different variable u instead of z so I have not to confuse the two this is equal to the average value of e to the power u n but the average value of e to the power u n is summation over the allowed values of n p of n e to the u to the power n e to the u to the power n is e to the n u so this is all it is and therefore we have our relation which says the generating function and the moment generating function are related by m of u is f of e to the power u so instead of z to the power n I got e to the u to the power n so if I know f I knows m and vice versa the great advantage of this is that if I differentiate this after all if I take this and write this as summation n equal to 0 to k equal to 0 to infinity out here u to the power k n to the power k over k factorial expectation because that's a random variable n here then immediately you see that n to the power k is the derivative the kth derivative of m of u over du k at u equal to 0 so directly by differentiation generates the moments themselves and not just the factorial moments so that's why it's called the moment generating function and it directly gives you the various moments so that's an advantage and I will stop here but we have seen we have seen that the moments themselves are not that helpful first of all there's an average which you got to remove first of all so it's not advantageous to use the moments but it's much better to find out what's the average value of n minus n average to the power k and take its expectation value these are the central moments these are just the moments but these are the central moments the moments about the mean value so any systematic drift or shift is got rid of for example the second part of these follows when k is 2 is precisely the variance as you can see even this is not very useful when you have many random variables because the sum of the central moments of independent random variables is not the sum of the kth moment of the sum is not the sum of the kth moments central moments so we need to do exactly what we did for the variance get rid of those extra contributions and make it purely additive and that's where the cumulants comes in this is a little bit like if you are used to statistical mechanics it's a little bit like going from the partition function to the Hemmholtz free energy so what you have to do is to ask can I write can I write e to the power u n as equal to e to the power some function of u which is in a power series here in general okay so this like saying I take the partition function and then I take it's you know free energy is defined as a log of this guy this partition function can I do that trick or not and that k of u will be called the cumulant generating function in fact that is the basic idea in statistical mechanics when you are writing the free the partition function you are writing a generating function actually for some distribution and when you find the free energy you are writing the corresponding cumulant generating function and that's what is additive so we will talk about this next time okay.