 Okay, we saw last time that the problem of random walks, random flights led very naturally to a Gaussian distribution for the end to end distance, the displacement and this looked like it was part of a very general result. Namely you added up a whole lot of identically distributed random variables and you got a Gaussian in a certain limit. This is not an accident, it is actually part of the central limit theorem which as I stated last time essentially says that if you have an identically distributed random variables then a linear combination of these random variables suitably rescaled and shifted will in the limit as n goes to infinity end up with the Gaussian distribution provided each of the random variables has a finite variance. This was some substance of the central limit theorem. Now this is part of a more general class of distributions called stable distributions and I would like to talk about stable distributions to start with and I will try to explain at least qualitatively what this stability refers to what exactly it implies. So we start by asking suppose I have a set of identically distributed independent random variables and let us call these random variables x1, x2, xn and let us suppose these are iid or independent random variables and let us suppose that the cumulative distribution function of this set of each of these variables is some f of x okay. So the distribution function cdf equal to some f of x. What this implies is that the probability that any given random variable xi less than equal to x this thing equal to f of x and then we ask the following question. Is there any special form or forms of f of x this distribution function such that if I add up a whole lot of these random variables iid or v's and rescale them in some suitable fashion the distribution function for the sum the resultant remains f of x does not change at all. If you can do that for every n greater than equal to 2 then this f of x is said to be a stable distribution okay. So now let us formalize this definition and write it in formal terms. There are several equivalent ways of defining a stable distribution but I am going to quote a couple of them and not try to prove the equivalence of these definitions but they will become intuitively clear what we mean as we see the explicit forms possible for this f of x okay. Just to recall to you what this f of x is for a Gaussian distribution for instance for a Gaussian for a Gaussian this f of x recall is integral from minus infinity up to x dx prime e to the minus x minus x prime minus mu square over 2 sigma square and that as we know is an error function this thing here. So it is minus infinity to x I can write it as minus infinity to 0 and then 0 to x. So this quantity will turn out to become equal to 0 to minus infinity to 0 is half the Gaussian after you shift to the origin here to x prime minus mu you said that equal to some other variable. So this is a half and then there is a 1 plus an error function of we shifted the variable therefore it is a function of x minus mu x minus mu divided by we scaled it with root 2 sigma square. So this is what the cumulative distribution function for a Gaussian looks like and so on. So for each of these cases you can write down the cumulative distribution function and non-decreasing function of x and then we ask under what conditions is this f of x stable. So here is definition 1 one way to do this is to say that if for every n greater than equal to 2 there exists a constant a n which is positive and b n which is just real such that this combination summation x of i i equal to 1 to n that is the sum of these identically distributed random variables shifted by some constant some amount which is n dependent and then rescaled with a 1 over a that is a random variable 2 if this random variable can be shown to have the same cumulative distribution function as each of the components x of i then I say that f is a stable distribution. So that is a precise definition you are still left with the task of finding out if this is going to work or not for a given f of x you have to find out if you can find a suitable constant a sub n and b n for each n greater than equal to 2 and if that is possible then it is a stable distribution. We will see examples of space we will write down all the stable distributions in some sense but we will see where this gets us that is the first definition it is in fact what I have said here in words. Yes. You could have n greater than equal to m where m is some finite number so yes so we will talk about divisibility and so on but this is the requirement that this is should be true for every n should be able to do this then and only then is it a stable distribution. So this is a necessary and sufficient condition that this be true but it is operationally not very useful as you can see although it is a formal definitions not telling us how to go about finding such a stable distribution okay here is a second definition which is equivalent to the first. So if x1 and x2 just two of them are independent identically distributed random variables with cdf f of x and if the following random variables if for given any given positive a1 a2 greater than 0 the random variable a1 x1 x1 plus a2 x2 minus some constant b divided by a constant a greater than 0 if in fact and if for a given this thing this random we can find a greater than 0 and b such that has distribution f then f is stable. So this says okay forget about adding n of these guys trying to find out something for all n and so on just take two of them and so on and if for any given positive constants a1 and a2 for every set of given positive constants you can find the positive constant a and another constant b real constant such that this combination this addition linear combination subtracted out so to be divided and rescaled by a if that is got the same distribution function f of x then f of x is stable this two is a necessary and sufficient condition okay and with a little work one can show that these are equivalent definitions here but you see again neither of these things is saying anything about f itself it is saying if take this random variable or that random variable and test what its distribution function is and so on. We need a condition which says something about the distribution f itself and that is the third definition and that goes as follows it says if for given positive a1 and a2 there exist a greater than 0 and b such that the convolution of f of x over a1 with x over a2 if the convolution of these two distribution functions is equal to f of x minus b over a then f is stable and now we are getting somewhere because this is now directly a condition on the distribution function itself and what does it say it says you scale one out you scale the other out and then you so to be and then you have a convolution and for any given positive a1 a2 if you can find a subtraction constant and a rescaling constant such that this is true then f is a stable distribution. Now in every one of these definitions if you leave out this shifting if this is not there if you do not need this this constant or that constant if this is a 0 or the b is 0 here then you say the distribution is strictly stable otherwise you say this is a stable distribution. So strictly stable distribution is a special case of a more general definition of a stable distribution. Now this definition immediately suggests to us the following it says if these two things are in convolution it means in some sense that the Fourier transforms would multiply and it is immediately telling us that this Fourier transform has a certain factorization property and only then would this be possible at all which sort of tells you in some sense when will it have this factorization property if you go back to definition 1 you need to add n of these fellows so we need a characteristic function which is a Fourier transform of probability density function which should in some sense factorize which means it must be exponential in some form because what you want is the expectation value of e to the i k x that is the characteristic function and if x is the sum of terms these exponents would multiply each other if they are independently distributed right. We saw that already working for the random work because recall that in the random work problem although I did not write that out explicitly this is really what it meant there we started with a vector r which was a sum of r 1 plus dot dot dot up to r n in this fashion and then I said the characteristic function of this random variable here was e to the i k dot r whatever it is so this is the expectation value of e to the i k dot summation 1 to n r i and the next step was to write this out as an exponent here because it is a product as soon as you write it out so this is equal to k dot r 1 e to the i k dot r 2 all the way up to e to the i k dot r n and then came the crucial observation that these are all independent steps and therefore the expectation value of a product is a product of expectation values so it immediately became e to the i k dot r 1 any one of these guys to the power n immediately and this was the one step this was just the Fourier transform of the one step random work which was p 1 of r p 1 tilde of k in this case and then it became raised to the power n and if you recall this was sin k l over k l to the power n and then those of these integration variables etc so this suggests to us that that is probably happening in general for a stable distribution and indeed it is so it will turn out that all the stable distributions can be classified completely and they are classified with the help of four parameters the most general stable distribution is class if is labeled by four parameters I am not going to write the general form down text on statistics will tell you what the most general form of the distribution of the cumulative distribution function is for a stable distribution but what we need to understand is the following it turns out that this coefficient a sub n that we are talking about in the summation in the first definition this coefficient a sub n in this definition here must necessarily be of the form n to the power 1 over alpha where alpha is a positive constant okay where 0 less than alpha less than equal to 2 it turns out and explain why it is restricted by this range it turns out also that all the stable distributions are unimodal distributions there is a single peak for every one of them and they are labeled primarily by this index alpha the exponent or index alpha okay and it is unfortunately true that you cannot write down an explicit expression in general for the probability density function for a stable distribution but because we see that the characteristic functions must in some sense be multiplicative exponentials get multiplied to each other it turns out that the characteristic function p tilde of k must be of the form apart from phase factors it must be of the form e to the minus some constant times k to the power alpha so I am merely stating these results I am not proving this merely stating these results and you can see that for a Gaussian this was e to the minus k squared apart from a phase factor write that down explicitly but this it went like e to the minus k squared for a Cauchy distribution it went like e to the minus mod k to the power 1 and so on so they look it looks like those guys are going to become stable distributions okay now the restriction here is sort of understood in the following way at least heuristically it will be the following suppose alpha were negative then this is e to the minus 1 over mod k to some positive power and as mod k tends to infinity that will tend to unity because it goes to e to the 0 so this means if alpha less than 0 p tilde of k will go to 1 as mod k tends to infinity plus or minus infinity that cannot be integrated so you cannot find a Fourier transform which will give you the probability density function normalizable density function so it is easy to understand why this restriction appears that is immediate on the other hand if alpha is greater than 2 then it is a little more subtle to show why this cannot be a characteristic function because it turns out that the inverse Fourier transform cannot be shown to be non-negative on the other hand you know that the pdf p of x must be non-negative as a probability density function so that is what puts the restriction on this side out here so rules out this rules out alpha less than 0 pdf must be non-negative implies alpha less than e to the 2 and that is harder to prove I have not come anywhere near proving it but this is a statement that you have to take as on faith that if alpha is greater than 2 you cannot establish the non-negativity of the Fourier transform inverse Fourier transform so the stable distributions are characterized by this index here and the actual formal name for these stable distributions is they are actually called levy q alpha stable and for short I will just call it stable distributions there is a little bit of confusion in terminology here because it turns out that there is one of the stable distributions is called the levy distribution and it is not the general family that is being referred to here. So we will just call these stable distributions now what are the other properties of these distributions well the Gaussian is certainly a stable distribution as we will see and the most famous cases are the following 3 main cases 3 important they are the ones that occur in practice very very often especially the Gaussian the first of these is the Gaussian one is the Gaussian and this is alpha equal to 2 and we know what the density function looks like p of x this is equal to 1 over root 2 pi sigma squared e to the minus x minus mu squared over 2 sigma squared and we know the characteristic function to p tilde of k or incidentally if the distribution has a density p of x pdf p of x this is the Fourier transform so p tilde of 0 must be equal to 1 because that is the integral of p of x from minus infinity to infinity. So this fellow here is e to the minus i mu k minus one half sigma square k square remember that the moment generating function was just a the cumulant generating function was just the quadratic of this kind it was a mu times u plus half sigma squared u squared and the characteristic function is the moment generating function at the value u u equal to minus i k so it is this p tilde of 0 is 0 is 1 as you can see so it is normalized correctly and that is the Gaussian expression what is the variance of the Gaussian? Sigma squared is the variance of finite variance okay the second case that is very important is the Cauchy distribution and in this case alpha equal to 1 actually the general Cauchy distribution need not be symmetric between about the mean value it is in general skew but we are looking at a special case where the certain other parameters other than alpha the other three parameters have been set equal to special values and the most common form of this is when p of x equal to some lambda over x minus mu whole squared plus lambda squared is lambda over pi that is the normalization constant this is x curly x what is p tilde of k in this case it is got to be proportional to k e to the power minus mod k because remember I said that the Cauchy distribution corresponds to alpha equal to 1 so for the Cauchy the symmetric Cauchy distribution this guy here p of k equal to e to the minus i mu k minus lambda mod k okay again p tilde of 0 is 1 it is normalized and it is got exponent alpha equal to 1 the mean value is mu and that is the peak of the distribution it is unimodal so is this unimodal peaked about mu what is the variance of this distribution what do you think the variance goes like well you got to multiply this by x squared and integrate minus infinity to infinity and the denominator goes like x squared so it diverges yes the variance is infinite for this the variance is infinite what is the mean value mu but barely so because if you just did naive power counting you put an x here and you integrated then the denominator goes like x squared so the whole integrant goes like 1 over x which will logarithmically diverge but because it is symmetric about that midpoint if you shift to x minus mu the answer turns out to be 0 the mean value so the gives you a finite mean mu but it is barely so the variance certainly infinite for this distribution okay notice that this distribution has a tail this guy has a tail that for large values of mod x it goes like 1 over x squared unlike this which has an exponential e to the minus x squared that goes to 0 faster than any power any negative power of x plus minus infinity and this is going to be a general feature this is a general feature turns out that as soon as you have this property here p of x will turn out to go asymptotically namely as mod x tends to plus infinity it will go asymptotically like 1 over mod x to the power alpha plus 1 for alpha less than 2 and indeed when alpha is equal to 1 you see it goes like 1 over x squared out there and what does this imply if alpha is less than 2 and the denominator goes like 1 over mod x to the power alpha plus 1 it implies infinite variance so it says the entire family of stable distributions except for the Gaussian all of them have a huge amount of scatter the variance is formally infinite and the Gaussian is the only stable distribution with a finite variance in fact this also tells you that if alpha is less than 1 between 0 and 1 even the mean value is infinite even the first moment does not exist for those distributions but certainly the variance is finite only for the Gaussian this is a very crucial observation and all these fellows are called heavy tailed distributions essentially it says large values of this of these random variables are possible and have a probability mass which is significant unlike the Gaussian where it just gets cut off faster than any inverse power of x that is a crucial observation the third of these will come back to this the third of these special cases is this it is called the levy distribution and it corresponds to alpha equal to a half and it looks like this the distribution P of x is characterized by a constant C so 1 C over 2 pi x cubed to the power half e to the minus C over 2 x but here 0 less than equal to x less than infinity I have shifted the it is of semi infinite random variable in the semi infinite range for the random variable and I have shifted that the beginning of that range to 0 is suitable rescaling by a translation so there is a constant C positive and it is not hard to check that this is normalized to unity you could ask what is the characteristic function here turns out P tilde of K not surprisingly is e to the minus C modulus K to the power a half as promised it is what it should be and it is multiplied by a phase factor in this case it is 1 plus i times the sign of K and it is called the levy distribution what does it look like what is the shape of this fellow look like well for the Lorentzian and Gaussian we have seen what the shape looks like for this thing here here is x here is P of x x tends to infinity positive infinity this factor tends to unity this goes like 1 over x to the 3 halves in the denominator that itself tells you that the variance has got to be infinite because the denominator goes like x to the 3 halves alpha plus 1 so it is 1 over x to the 3 halves and what does it look like near the origin what is it going to do near x equal to 0 0 is it 0 or infinite or finite it is 0 it is dead 0 because this factor in the denominator is swamped by the exponential factor e to the minus 1 over something which goes to 0 is going to go very rapidly to 0 so this function not only is it 0 at the origin but all its derivatives are also 0 at the origin all its derivatives of finite order so it looks like this is 1 over x to the 3 halves dk and the peak is characterized by the scale c so you could ask where do these distributions appear where do they occur but turns out there is a very close connection between different stable distributions in a very specific sense oh by the way let me before I go on mention that although I have written down explicit forms for the probability density function for these 3 special cases this is not in general possible for generic alpha between 0 and 2 in fact turns out that you cannot write this p of x in terms of elementary functions other than these cases these 3 cases you can write p of x in terms of hyper geometric functions for rational values of alpha and so on like 3 halves etc but in general all you can do is to write down specific forms for the characteristic function for its Fourier transform but already that gives us all the information we need about these distributions here are examples some physical examples of when this is going to happen when these distributions are going to appear for the Gaussian of course we see it appears everywhere so let us go back to our same expression of random flights of diffusion or something like that when you have a particle diffusing on a line and we will do this in some detail later on along the x axis if you have a particle freely diffusing on the x axis then its probability density function if it starts from the origin at t equal to 0 is of the form e to the minus x squared over 4 d t but d is called the diffusion constant divided by square root 4 pi d t that is a Gaussian with the 2 sigma squared equal to 4 d t or sigma squared the variance is 2 d t so it says the variance of this particle increases as time goes linearly with time so that is a Gaussian distribution but now if you ask what is the distribution of 1 over x squared that turns out to be a levy distribution because it is not hard to see that if you have p of x let us write a normal Gaussian down equal to 1 over root 2 pi sigma squared e to the minus x squared over 2 sigma squared say a Gaussian centered at the origin and ask what is the probability density function of the random variable xi which is 1 over x squared that has a levy distribution in fact the density function for rho of xi this will imply is 1 over root 2 pi sigma squared xi cube e to the minus 1 over 2 sigma squared so the constant c is 1 over 2 sigma squared square root 1 over 2 sigma c over 2 whatever it is so this is precisely a levy distribution with exponent half here but that is if I took this random variable and I gave the example of the Maxwell distribution of velocities where I said the energy has a very strange distribution 1 over square root of epsilon e to the minus epsilon exponential that was not a levy distribution but here we are asking for the distribution of 1 over x squared and then it has this right here well in connection with the diffusion problem itself there is another random variable which has precisely this kind of distribution for instance if you ask alright I start with diffusing particle on the x axis I start at 0 and I ask as it moves about what is the first what is the distribution of the time where it first hits the point x some given point x let us call this just to be not to confuse it with that random variable x let us call this a and ask here is this particle diffusing on the x axis starting at x equal to 0 at t equal to 0 and I ask what is the probability that between time t and t plus dt the particle crosses this point a for the first time because it is doing as exact motion for the first time and that is the distribution the random variable here is a time and if I call that q to cross the point a at time t having started at a at 0 and we want to I want to cross t at the point a at time t having started at the point 0 this quantity here this is equal to turns out 1 over turns out to be a over 4 pi dt cubed to the power 3 half e to the minus a squared over 4 dt and that is a distribution in time so t greater than equal to 0 and integral q of t a 0 dt 0 to infinity equal to 1 that we know because this is a levy distribution which is normalized to unity already it is called the first passage time distribution and it is precisely a levy distribution so that is the simplest physical example I know of where levy distribution appears yeah sorry this is a power half I already put a t cubed in here so quite right it is a half so it is t to the 3 halves in the denominator here in general there is a connection between a random variable which has a stable distribution with index alpha where alpha is between 1 and 2 and a random variable which is a function of this original random variable has a stable distribution with index 1 over alpha so if for instance if x has a stable distribution with index with exponent alpha where 1 is less than equal to alpha less than equal to 2 then 1 over x to the alpha has a stable distribution and remember that this 1 over alpha therefore is half less than equal to 1 so the new exponent is between half and that is what we use there when I said that a Gaussian which has exponent alpha 2 equal to 2 1 over the Gaussian square 1 over x to the alpha has a levy distribution with exponent half. That is nicely the point okay similarly you could ask does the Cauchy distribution appear in a natural way in the diffusion problem notice that everything with alpha less than 2 has no variance they are all heavy tailed no variance at all what about diffusion problem in which the Cauchy distribution appears naturally there are lots of places where the Cauchy distribution which is called a Lorentzian in physics appears naturally but here is a very simple instance again let us go back to the diffusion problem and look which is a physical problem and look at a very simple function of a random variable. So suppose you have 2 particles both of it started the origin and diffuse on the x axis such that the coordinate of 1 at any instant of time is x 1 and the other one is x 2 and you look for the random variable Xi equal to x 1 over x 2 and ask what its distribution is okay where each of these has a probability density function given by the solution of the diffusion equation right. So it says rho of Xi therefore as a function of time this is equal to an integral minus infinity to infinity dx 1 minus infinity to infinity dx 2 and let us suppose for simplicity they have the same diffusion coefficient that need not be the case but then there is a 1 over 4 pi dt and then e to the minus x 1 squared minus x 2 squared over 4 dt and then a delta function of Xi minus x 1 over x 2 that is the normalized density function for this over Xi. Now what is the physical range of Xi each of x 1 and x 2 runs from 0 to minus infinity to infinity. So what is the range of Xi again minus infinity to infinity right so in that sense we are spared putting extra conditions and all we have to do is to do this integral out here. Now the obvious way to do this is to write this as x 2 times Xi and get rid of the x 1 integral right. So let me write this as a delta function of x 1 minus x 2 Xi and I have to remove this factor 1 over x 2 from there and take its modulus so this becomes mod x 2 and then all I have to do is to replace x 1 by x 2 times Xi. So this becomes e to the minus x 2 squared into 1 plus Xi squared over 4 dt and all this goes away the x 1 integration goes away and I have this. So this is straight forward to do all I have to do is to write this as twice 0 to infinity and get rid of the modulus it is an even function now. But 2 x 2 dx 2 is d of x 2 squared so in change variables to x 2 squared then this goes this goes and this becomes some d u over 4 pi dt e to the minus u times this fellow here. So that is a trivial integral and d u times a e to the minus a u is just 1 over a when a is positive right. So that will kill this 4 dt and give you 1 over pi into 1 plus and that is a Cauchy distribution sorry about mean value mu equal to 0 with this lambda parameter set equal to 1 what is interesting about this I said this is the distribution at time t so what really happened here to t it disappeared it completely disappeared. So this is true at all times it is really true at any instant of time so the ratio of the coordinates for a diffusing for 2 diffusing particles ratio of the coordinates is actually independent it has a distribution independent of time and it is a Cauchy distribution what would have happened if I had d 1 and a d 2 so I leave that to you as an exercise if the first particle as a diffusion coefficient d 1 and the second one has a d 2 then show that the result is still a Cauchy distribution except this parameter here I mean there would be a d 1 over d 2 sitting here there would be a lambda parameter which is equal to 1 in this special case okay. So this is one more place where the Cauchy distribution appears naturally in the lots and lots of such examples. Now we will say a little more about this Levy distribution and these long tail distributions a little later when we do anomalous diffusion we talk about anomalous transport but the take home lesson is that you have this family of very special distributions called stable distributions and their characterized primarily by this exponent alpha alpha runs from positive runs up to 2 2 is the extreme case of the Gaussian which is a very respectable distribution it has got moments of all orders including a finite variance and all the others are heavy tailed and they do not have variances okay. On the other hand you could ask is there a central limit theorem for them because we already said there is a central limit theorem for the Gaussian so is there a generalized central limit theorem for all the stable distributions the answer is yes so if you started with identical very distributed variables and you said that they did not have variances but for instance if you have a P of x going like 1 over x to the power alpha plus 1 and you ask the variance does not exist because alpha is less than 2 on the other hand does some what is the maximum moment that exists for this distribution so you could ask a thing like what does what kind of dx if I say x to some beta divided by x to the power alpha plus 1 at infinity when would this exist and I put a P of x P of x has a tail which goes like this so if I put beta equal to 2 I am in trouble if alpha is less than 2 but what is the maximum value of beta that you can have for which this converges so it is clear that this denominator must go to 0 the whole thing must go to 0 faster than 1 over x so you must have alpha plus 1 minus beta to be greater than 1 right or beta must be less than alpha so if this alpha for example is 3 halves then although the second moment of this distribution does not exist the beta moment would exist even if beta is a fraction as long as beta is less than 3 halves okay and as alpha gets closer and closer to 2 you get the variance would be would diverge formally but beta would exist where beta gets closer and closer to 2 okay so such moments would certainly exist then this generalized central moment central limit theorem says that if you have a whole lot of IID random variables such that the beta moment exists where beta is just less than alpha out here then the sum of those fellows in a suitable limit as n tends to infinity would tend to one of these the appropriate stable distribution in this case so this is a generalization of the central limit theorem it simply says that each of these stable distributions is the attractor for a whole family of distributions all of which have a certain moments up to a certain order and then the maximal one among those moments will decide the alpha value for the stable distribution to which these distributions get attend in the limit okay so this is what the appropriate generalization is and there are further generalizations of this I will mention this a little bit more when we do fractional Brownian motion when we talk about Brownian motion which is not the usual kind okay. So so much for stable distributions they have a lot of other interesting properties we can discuss subsequently but now I would like to ask a reverse question a different kind of question I would like to ask the following given not a set of IID RVs but given a random variable X with certain properties specified distribution function and so on and so forth when can I write this random variable as a sum of two identically distributed random variables if I can write it as a sum of two random variables IID RVs then I would say this variable is two divisible if I can write it as a sum of three IID RVs I would say it is three divisible and so on and in general n of them n divisible then I can ask are there random variables for which I can write the random variable as a sum of n IID RVs for all n greater than equal to 2 no matter how large if I can then I say this divisible this random variable is infinitely divisible so I would like to introduce the concept of infinite so this is when X can be written so when this can be done then I say X is an infinitely divisible random variable for every n I will call the X sub I is the components of X because you add them all up you get X and it is a very special property you can see immediately it is not going to happen most of the time but when it does you have an infinitely divisible random variable and what makes things interesting is that the distribution of every one of these X's need not be the final distribution of X itself need not be so at all you just want them to be identically distributed random variable which is a common distribution function which could be different in functional form than the distribution of the sum itself right so it is clear that stable distributions are infinitely divisible immediately not only that in the case of stable distributions the distribution of each of the X I's for every n is exactly the same as the distribution of X itself that is a very special property right so it is immediately clear that stable distributions are infinitely divisible is the converse true no reason why that should be true at all no reason at all so the converse is not true we are going to give counter examples so converse not necessarily true this idea of divisibility is a little subtle you have to be a little cautious here you may have a random variable let me give you an example suppose you have a random variable which takes the value 0 or 1 it is a Bernoulli trial let us say so this variable can take X 1 can take values in the set 0 1 and X 2 also takes values the set 0 1 X 3 0 1 now what is the sample space of the random variable X equal to X 1 plus X 2 plus X 3 0 to 3 0 1 2 3 so this has sample space 0 1 2 3 and clearly just by inspection if you can see these appear with equal probabilities is just heads or tails and you are asking what happens to the sum of the scores right then it is clear that this is 3 divisible this random variable has some distribution in this case it will be a binomial distribution and it is 3 divisible in this fashion is it 2 divisible is it possible to have a random variable which takes values in the set 0 1 2 3 and ask can it be written as a sum of 2 IID RVs is this possible well suppose you say 0 and 1 it is clear it will not reach 3 that is gone then you say 0 1 and 2 let us suppose each of the components has value 0 1 2 then 4 is in the sample space of the sum which is not given okay then you say let us suppose it is 0 and 3 halves so it reaches this but since 0 is in the sample space 3 halves has to be in the sample space which it is not so there is no way in which you can make this 2 divisible right so here is a random variable this fellow here which is 3 divisible but not 2 divisible so this divisibility is not such a trivial concept it requires a little bit of understanding so not everything is divisible but now we are saying something much stronger you are saying for every n this variable is divisible n divisible so it puts a lot of constraints on this on the possible distributions that can have this property what do you think is the primary property that it has because this has to become IID RVs it implies that the characteristic function must be a product of characteristic functions because these are right so it immediately implies that x must have a characteristic function p tilde of k which must be of the form the nth power of some other characteristic function so this must be of the form p tilde let me put an n here to show these are different functions for different n's in general and it must be of this form okay only then is this variable going to be infinitely divisible is this random variable going to be infinitely divisible so now the matter is simple all we got to do is to look for all those characteristic functions which has this property here so as soon as you can write this the matter is over let us look at that example again let us look at that guy here and see what what this implies for divisibility but we haven't answered questions like is it always going to be unique for a given n and so on and so forth not no a priori reason why this should be so and so on but tell me if I take an arbitrary characteristic function p tilde of k in this fashion and ask can I not always write it as something to the power I do a 1 over n here under is a power n is it not always going to be the case suppose that were true it would imply that this is an honest characteristic function so it means p tilde of 0 is 1 and its inverse Fourier transform is non-negative but now you are asking if I raise this to the power 1 over n and I get some function here Phi n of k you are saying this 2 should be a characteristic function it 2 must have an inverse Fourier transform which is a non-negative and that is not true in general so this means that divisibility is not a trivial concept at all not necessary that this is going to happen all the time happens only in special cases now let us look at the Bernoulli trials that we talked about now if you had n Bernoulli trials then the distribution that we got for the resultant was in fact a binomial distribution if you recall right so in that case you got the binomial distribution was of the form some n n p to the power n 1 minus p to the power n minus n in this fashion and what was the generating function for this guy what was the f of z in this case it was a very straightforward thing it was just so remember that f of z was equal to p z plus q to the power n that is all it was right and then the characteristic function p tilde of k in this case was replaced z by e to the minus i k and that was it or q plus p e to the minus i k etc right is that n divisible you can see it is a product of functions all identical functions p plus q plus p e to the minus i k raise to the power n so you immediately say it is n divisible provided this fellow itself provided q plus p e to the minus i k was the characteristic function of something or the other and it is it is the characteristic function of a Bernoulli trial a random variable which takes value p 1 with probability p and 0 with probability q right so trivially the binomial distribution with parameter capital N is n divisible into n Bernoulli trials not binomial distribution binomial random variable at all but n Bernoulli trials immediately follows what about the geometric distribution what about the negative binomial distribution what happened in the case of the negative binomial distribution recall that the negative binomial distribution had a distribution which looked like n minus n plus 1 n p to the power n q to the power little n little n was a random variable which took all the non-negative integers in its sample space 0 to infinity and capital N was some given positive integer okay this fellow here had a generating function f of z which was p divided by 1 minus q z to the power n that is why it was called a negative binomial distribution is that n divisible it looks like the nth power of something capital N power of something right so if p over 1 minus q z is a characteristic function of q so this case p tilde of k equal to p over 1 minus q e to the minus i k to the power n so this fellow is a characteristic function or if this fellow alone 1 p over 1 minus q z is the generating function for a probability distribution then this negative binomial distribution with parameter capital N is capital N divisible into n of those distributions n of those random variables right is that a generating function p over 1 minus q z yes it is the generating function of a geometric distribution right which had a probability density function probability distribution p times q to the power n so immediately this is n divisible n divisible into n geometrically distributed random variables what about the Poisson distribution is that n divisible let us write the random let us write the distribution down for a Poisson we had e to the minus mu mu to the power n over n factorial was p of p n and the characteristic function p tilde of k was equal to e to the minus mu e to the minus but e to the power mu e to the minus i k minus 1 it was e to the mu times e minus 1 for the generating function so I put z e to the minus i k get the characteristic function can this be written as the nth power of something yes trivially so what sort of random variable has a characteristic function like that a Poisson with mean value mu over n for every positive integer n right so it is n divisible is it stable is this it does not fall in the family of stable distributions it is not discrete in the sample space is discrete and so on and so forth so it is the discrete analog of a stable distribution but it is infinitely divisible this guy is infinitely divisible it even has this property that for every little n it is n divisible into n Poisson random variables with appropriate means etc is the Gaussian stable distribution n divisible infinitely divisible yes indeed it is a stable distribution so it is immediately so and you see that at once because in that case the characteristic function p tilde of k was e to the minus i mu k minus 1 half sigma squared k squared and you can certainly write this as mu over n and sigma over square root of n squared and the whole thing raised to the power n so of course it is it is n divisible with mean mu over n and the standard deviation sigma over root n it is a stable distribution so it is automatically infinitely divisible as well okay what about the skellum distribution the difference of two Poisson random variables is that n divisible is it infinitely divisible you would expect it to be so because in this case it is just the difference of two Poisson random variables and if you recall this had e to the mu e to the minus i k minus 1 plus mu times e to the minus u so it was e to the i k minus 1 that is what the characteristic function was for for the skellum distribution and of course now it is very trivial matter to say this is mu over n mu over n and I raise this to the power n so yes it is also infinitely divisible okay so the set of infinitely divisible distributions is a bigger set than the set of stable distributions but the stable distributions as a very special subset of it in general for infinitely divisible distributions the components do not have the same distribution as the original distribution but for the stable ones they do and for the Poisson okay so it becomes an interesting question to classify all such distributions this gets us into statistics I am not going to go into that detail here except to show you that by these simple examples you can see the idea of the notion of divisibility and what sort of role it plays we will try to get back to this in various other examples so let me stop here today.