 Welcome back. Today, we will discuss sum of a random number of random variables. So far, so last lecture we studied sum of random variables. So, given 2 random variables x and y or given n random variables x 1 through x n, we know how to derive the CDF of x 1 plus x 2 so on till x n, given the joint CDF of these random variables. So today, so in the beginning we will discuss today, the case when you are the number of random variables you are summing is also a random variable. So, you are not summing some small n number of random variables where n is fixed, but the number of terms you are summing is itself a positive integer value random variable. So, let us get started on this, so you have let x i i greater than or equal to 1 be a sequence of independent random variables. So, now here we will only consider the case when these x i's are independent, the dependent case is more complicated. Let n be a positive integer valued random variable with t m f t n of n equal to probability that n is equal to n. So, I should say with CDF f x i that is also known assume that n is dependent of x i's. So, actually what I really mean is that n x 1 x 2 x n they are all independent random variables. Now, you want to consider the sum consider s n is equal to i equals 1 through capital N x i. So, what this means, so let me try to explain what it actually means in terms of the sample space. So, you have a sample space omega f p and you have these random variables x i's defined on the same sample space. Now, this n is also a positive integer valued random variable defined on the same sample space they all live on the same omega f p. So, there is just one underlying source of randomness and little omega realizes and once little omega realizes these x i's of omega will realize as a sequence of real numbers and n of omega will realize as a positive integer some positive integer. So, when omega realizes, so you will sum i equals 1 through n of omega n of omega is now a positive integer x i's of omega. So, you have your sample space omega f p you have some little omega realizing and what I am saying is that. So, x i of omega i greater than or equal to 1 will be a sequence of that particular omega and n of omega will be a realize as some particular some positive integer valued random variable. So, it will be some positive integer for a given omega. So, as when little omega realizes this s n of omega will simply be i equals 1 i equals 1 through n of omega sum x i of omega. So, the number of terms you are summing is also depends on the realization omega it is not a fixed number, but there is only one source of randomness omega is what determines the realization of n as well as the x i's. Now, what you are saying is, so we assume that n and x i's are independent random variables and we know the c d f of the marginal c d f of this x i's is known for each i and also the p m f of n is known. Now, the task is to find first of all I mean you have to make an argument to show that s n is a random variable. We know that some of the fixed number of random variable is a random variable, but you have to make sure that s n is in fact a random variable s capital n and then you have to determine the c d f of s n. So, we proceed as follows. So, we want this right this is the c d f of s n. So, this is what we want to find. So, what we will do, so we will use. So, the realization of n partitions the sample space. So, I am going to look at a partition of the sample space. So, for certain omegas, so this set of omegas for this n may be equal to n of omega may be equal to 1 and for some. So, it takes only positive integer values right. So, this will, so this may correspond to n a of omega equal to 2 and so on right. So, they may be, so if capital n takes only finitely many values the sample space will only have finitely many partitions. If it takes infinitely many values it will have a countable infinity of this partitions correct. So, once I have a partition of the sample space I can invoke law of total probability right that is why I am looking at this. So, is this picture clear. So, all these omegas here will correspond to n of omega equal to 1 and x i of omega will be whatever we do not know right. I am just looking at the partition induced by n. Hence, we can write probability s n less than or equal to x is equal to sum over k equals 1 through infinity. Probability s n less than or equal to x given n is equal to k probability n is equal to k. This is because of the law of total probability right. I am not writing omegas as that everywhere. So, by now you understand that now. So, this is something we know right. Now, this is also something we can figure out because. So, given that n is equal to k this s n is simply equal to s k right which is a deterministic sum right. It is not a sum of random number of random numbers. It is a sum of a deterministic this is like. So, let me write this properly this is equal to k equals 1 through infinity probability s n less than or equal to x given x k. So, maybe I should write s k less than or equal to x given n equals k because I have just put n equals k here because that is what I am conditioning on right times probability n equals k. Now, do we know that. So, between this and this the nice thing is I am now down to a deterministic this on the number of terms is deterministic now it is k small k right and the summation outside is over k. So, can I deal with this now correct. So, because n is independent of all the x i's n is independent of s k see after all what is s k it is only x 1 plus x 2. So, until x k right as s k and you know that n is capital N is independent of your x is. So, capital N given n equal to k. So, these guys the x i's are independent right. So, this is nothing but the unconditional probability right because n is independent. So, this you can write this is because of independent of independent of n n x i I can write sum over k equals 1 through infinity probability. So, this will boil down to the unconditional probability of s k less than or equal to x times probability that n equals k. So, is the step clear this is because of independence great. Now, this is as I said is known to us and is this known to us this will simply be. So, this is the CDF of s k now k is a deterministic number for every term for every k. So, s 1 for example is simply x 1 and we know the CDF of that right s 2 will be x 1 plus x 2 and the CDF of that you can well if it is a you know if you have densities you can convolve right if not you know how to find it right. So, this if your x i's have densities you can convolve and find the densities if they do not have densities you can still find it right. They may be discrete x i's may also be discrete right there is no guarantee that x i's are continuous organizing. So, this is something you know how to find from previous lecture right and then you for each k you have to find it put it in and do the summation right. So, conceptually I mean. So, this does not simplify in full generality, but you know how to do it right given a specific example you will be able to solve it right any questions on this. I can see just in the case of a deterministic number of. So, when the sum is deterministic I said for example that instead of convolving you can use transforms right which makes your life easy. Here too when you use transforms your this simplifies little bit further we will come to that later, but for now all you can do is find this the hard way put it in and sum. Let me do an example which is a very famous example this is the case where x i's are i i d exponential with parameter lambda. So, each x i is exponential parameter lambda independent no there is no I mean there is nothing you can do about this, but you can this is something you know how to find from previous lecture you just plug it in and sum right. So, when you use transforms this formula simplifies a bit we will get to that later x i's are i i d exponential and n is independent of x i's and geometric with parameter p. So, I am summing so you know that. So, when you sum i i d exponentials you get what you do not when you sum exponential you do not get an exponential. So, you get an Erlang right we discuss this right we I think we derived in fact the Erlang to formula explicitly and if you sum n Erlangs you will get n exponentials you get the n th order Erlang right. So, this is a very well known example where you do not sum a fixed number of exponentials you sum a geometric number of exponentials. So, here it turns out the reason I am doing this is it gives you a nice answer it starts out looking messy, but it gives you a nice answer now. So, that is where we are. So, you can apply this rule right. So, you can so s n is equal to sum over i equals 1 through n x i. So, you can think of x i's as being generated as independent i i d exponentials and n is a independent geometric right it takes values in positive integers and you are summing a geometric number of i i d exponentials now what is the distribution of this is the question. So, you can just write probability of s n less than or equal to little x is equal to sum over k equals 1 through infinity probability s k less than or equal to x times probability n is equal to k. Now, we have this we know right this is simply. So, this is equal to what 1 minus p 1 minus p to the k minus 1 times p right and this guy I do not know of the top of my head does anybody remember the c d f of the length I think the p d f looks like the p d f looks like mu to the n x power n minus 1 over n minus 1 factorial e power. So, this is how the p d f looks right and the c d f is the integral of this. So, you put that in and do the summation see what I mean right. So, this is the p d f I think n minus 1 and I think this is right. So, this is the p d f, but you have to put the c d f in here right and you just do the summation it looks messy to begin with, but you can show that this gives you a very nice answer. In fact, what you will get is when you do the summation you get the you can show that s n is exponential with parameter lambda p. So, you sum a geometric number of exponentials you get a another exponential whose parameter is lambda times p and the way you get that is simply do the summation brute force at this point. So, it is nice that if you sum a fixed number of exponentials you get an Erlang which is somewhat messy right. So, this is like this is like small f x this is not the c d f not small f x this is small f s n is not it and from there you have to get this. So, this is a bit messy, but if you sum a geometric number of exponentials you get another exponential. So, that we just showed using brute force now there is a very nice interpretation to this result. You can think of. So, let us say so I mentioned that exponentials are. So, they are the time duration between two successive radioactive emissions right they are exponentials right. So, suppose and the IID exponentials right. So, suppose a radioactive source is emitting particles and they are all IID exponentials with parameter lambda. Now, so you are what you are doing is you are looking at. So, if you look at let us say in time let us say that is my first radioactive emission that is my second that is my third that is my fourth and so on these are radioactive emissions and these are all IID exponentials right. Now, what I am doing is I am summing a geometric number of these right which means I am tossing a coin. So, every time there is an emission I am tossing a coin whose probability of heads is p let us say and I am looking at the duration until the first head shows up every time there is a emission I will toss a coin let us say this gives me tails. So, next time there is a this is give me tails and let us say this give me heads then. So, the number of coin tosses I wait is clearly geometric that is what a geometric is after all and the total time I wait right until I see my first head is S n S capital N right because I am waiting for X 1 plus X 2. So, until X A right X capital N. So, this is the first time. So, let us say if this is where I am starting look at things this is the first time where I see heads with an emission right. So, you can think of this is splitting these alpha particles right. So, maybe your alpha particles are coming out you have a splitter that will send it this way with probability p this way with probability 1 minus p and you are putting a detector over there you are not putting a detector at the source, but you are splitting it and putting a detector over there. So, the first we are saying now that the first time that your detector after the splitter will see an emission is in fact also an exponential understand what I am saying right. So, you can think of this sources. So, if this is your radioactive source all right it is throwing out a bunch of particles and you split it you can think of heads is being splitting up and tails is splitting down right. And instead of putting a detector here and detecting all these emissions you are just putting a detector here and you will only see those emissions which corresponds to getting a head right corresponding to sending here. So, if you look at things here then it look like also exponential according to our result it look like exponential with parameter lambda p, but your original process is like exponential with parameter lambda right is that clear. So, you can give you can build any other story you want right the suppose you are waiting for a taxi and the inter arrival taxi times are exponentials and with probability p each taxi is full you cannot take it right. How long the distribution of how long you wait in the taxi stand is exponential with parameter lambda times 1 minus p right because the probability p is full right understand this. So, this again has to do with the what is known as a Poisson process and what this result essentially says is that if you split if you IID Bernoulli split a Poisson process you get another you get Poisson process also Poisson process being just the process of emissions of alpha particles that clear. So, this is a very well known example. So, in this course we will only discuss the case when capital n is independent of the exercise. So, the number of terms you sum. So, I made this step remember. So, this is only possible if capital n is independent of all the exercise and therefore, independent of S k right, but it is also possible to study the case when capital n depends on the exercise. So, that also has application. So, you may go to a casino and say that I will play a game an IID win or loss game a certain number of times and I will stop playing when my profit is 100 dollars or my loss is 100 dollars or something like that. So, in that case the index where you stop the capital n the number of times you stop is also a function of your previous winnings and losings right in that case. So, that is the case that you will not cover in this course, but it is covered in course of a next semester those case. So, in that situation you cannot make this you cannot go from here to here, but because n will depend on your previous exercise. But, it turns out you can still do it with the more sophisticated technique called using stopping rules you can actually figure that also right even when n is dependent on the exercise in this way, but that is not something we will cover this semester. Any questions? So, this example also. So, right now you have a very dirty way of doing it when we have transforms you will have much cleaner way of doing it, but I suggest you I mean I the reason I did not do it will take 20 minutes for me to do it I will probably make a mistake somewhere I may write n factorial instead of n minus 1 factorial right I did not want to waste class time on that, but I suggest you try it out it is not all the difficult actually it will probably take you 10 15 minutes to do it, but when you use transforms you will see how quicker it how much quicker it is any questions. So, now we will move on to more general transformations. So, far we only looked at maximum minimum some of random variables and some of a random number of random variables. Now, we will look at transformations which are not simply maximum minimum or summations it could be any function of function f of a random variable x or some function of n random variables x 1 through x n just to remind you the picture. So, you have omega f p and you have a random variable x and you have another function f which maps let us say r to r to begin with so if you look at some particular omega this will be x of omega and this will this point will be f of x of omega. And so the overall mapping will be this overall mapping will be f composition x of omega f composition x here we said that f of x or f composition x is a random variable as long as f is a boreal measurable function meaning that the pre measures of boreal sets here must be boreal sets here. And of course, we know that pre measures of boreal sets here are always f measurable right because x is a random variable. So, the problem at hand is we are given some y equals f of x when I write f of x it should be understood as a composition because x is a function remember this is like f composition x of omega. So, y of omega is f composition x of omega but this is the notation I will use not the best of notation, but I will use it. So, the problem at hand is you are given this y equals f of x and I am given the probability law or CDF of x and I want the probability law or CDF of y. So, in actually in principle at least conceptually this problem is quite easy right because let us say I want f y of little y which is the probability that. So, this is the probability of y less than little y less than or equal to little y. So, I am simply considering a semi infinite interval minus infinity y and I am looking at what is the probability of my omega mapping into this correct. Now, I do not see I do not know see I do not quite know what the. So, I am not going to look at the inverse map all the way here what I am going to do is because I know the probability law of x I am just going to look at the inverse map of the semi infinite interval on this real line. So, consider. So, you consider this right f or g already say f. So, f inverse of minus infinity y you consider that right this is the set of all x in r such that f of x is less than or equal to y. So, you look at the pre image of that set here all those x's which map to minus infinity y that clear. So, then it is clear that then. So, consider this consider that set then f y of y is simply equal to p x see this is simply a borel subset of this real line y is that exactly because minus infinity y is a borel set on this real line and f is a borel function. So, it is pre image here must be some borel set correct that borel set we know the probability law before that borel set. So, you can write this as is that clear. So, this is not p this is not the measure on the sample space this is the measure on this real line the probability law of x correct and this is known to us. So, I can write the c d f of y in terms of the probability law of x right. So, conceptually there is no difficulty is this clear in practice depending on how complicated f is this may or may not be easy to do. But conceptually there is no problem for any borel measurable function f and that is it right if you are looking for a formula there is not right this is it right. So, and there may be special cases where you can derive some formula. So, let me just do an example or 2 let x be a Gaussian standard Gaussian and let y equals x square. So, x is the Gaussian standard Gaussian what is the distribution of the square of x that is the question. Now, you know that x takes values in minus infinity infinity y will take values in 0 infinity because it is non negative random variable. So, we proceed as follows. So, we want f y of y equal to probability that y is less than or equal to y which is equal to the probability that x square is less than or equal to little y can you figure that out we know the CDF of x right. So, if you. So, just to remind you. So, the Gaussian sort of f x of little x is not it may be I should use may be I should not use f here I should probably use g because I use f for the p CDF right. So, probably you should put g here everywhere. So, you know that this looks like that right this is your Gaussian and you are looking at the probability that x square is less than or equal to some little y right. Which means your x should lie between minus square root of y and square root of y correct agreed which is simply the area under the bell curve from minus and minus square root of y to square root of y correct. And now you can do some simplification this is this you can write as because of the symmetry you can write it as twice the probability of that bit. So, this you can write as. So, that is my CDF. So, this is like twice 1 over square root of 2 pi integral 0 to square root of y right e power minus x square on 2 d x correct fine. So, that is my CDF if I want my p d f. So, this is. So, this you can in fact this is clearly a differentiable function because it is written like an integral right. So, you can differentiate this to get the density and the density f y of y will be equal to you differentiate that you will get square root 2 pi y e power minus y upon 2 that is it. So, this is for y greater than or equal to 0 this is also for y greater than or equal to 0. So, this is simply by differentiate differentiating this guy. So, this you get x square is y right. So, you have a e power minus y over 2, but you have to also differentiate with respect to y right. So, you get 1 by square root of 2 pi 2 y here the denominator. So, this is just algebra. So, this is called chi square distribution chi square or chi square random variable chi square density. So, if you square a standard Gaussian you get what is known as a chi square distribution that is the p d f. It looks like an exponential, but not quite because there is a 1 by square root of 2 pi here 2 y here 1 by square root of y here. If this were a constant this will be an exponential, but not quite there is a y here right. So, it is not quite an exponential. So, I just did this from first principles right I did not invoke I mean I just essentially I have done this right. I figured out the probability that x square is less than or equal to y f of x is x square right that is exactly what you have done here. So, example 2 let us keep our Gaussian and say y is equal to e power x in this case you will get f y of y is equal to probability that e power x is less than or equal to little y correct. And since, so this is a monotonic function monotonically increasing function. So, this I can write it as probability that x is less than or equal to log y and that is something I know right because I know the. So, this is simply integral minus infinity to log y 1 by square root of 2 pi e power minus x square on 2 dx is correct. And so this is clearly also a continuous unavailable. So, I can differentiate this c d f to get a p d f. So, what will I get I will get instead of x I will get log y square, but I will have a 1 by y sitting here because I have to differentiate log y also right. So, I will get 1 by y square root of 2 pi e power minus log y whole square log y whole square over 2 right that is my density. And this is valid for y bigger than or equal to 0. So, this is called log normal p d f or log normal distribution is there any questions on this. So, if you are given some reasonably nice function like this you can simply look at those x's for which y is less than or equal to y. And then write it in terms of this c d f of x and then leave it there or if you find a density you differentiate if it has a density right that is about it right. So, I think so far I have not said any I have not really derived this is a first principle we are going about it. But it is I have not derived any formula, but it is since in a certain cases where the function is differentiable this function f is differentiable and monotonic you can actually get a explicit formula for the distribution of y in terms of the distribution of x. So, that is something I should probably I will probably do that next class it is called the Jacobian formula. It gives you the it gives you in the case when x is a continuous random variable and f is a differentiable function well actually f is an invertible function I believe. Then you can get an explicit formula for the density of y in terms of the density of x I will derive that for completeness, but I feel that just doing it from first principle there is a very error proof way of doing it. So, we will continue next class.