 So, after obtaining expression for e x plus y, in fact, we showed that e of x plus y is e x plus e y and under independence of x and y, we showed that variance of x plus y is equal to variance of x plus variance of y. Then, we can generalize these results for any finite number of random variables. So, therefore, if you have x 1, x 2, x n as n identically distributed random variables discrete or continuous, then expectation of x 1 plus x 2 plus x n will be equal to expectation of x 1 plus expectation of x 2 plus expectation of x 1, x n. So, want to show that, say that this is, you can always take the sum of the expectation, sorry, I mean expectation of the sum as sum of the expectations. So, of course, these have to be identically distributed. Now, if the x i's are also independent, then we can also extend this result that variance of x 1 plus x 2 plus x n is equal to variance of x 1 plus variance of x 2 plus variance of x n. Because, as we saw for the two case, two variables that the product term vanishes. So, here also, because independence the product term, because you will have two, product of two things at a time, that means of the kind, you will have expression like expectation of x i minus expectation of x i into x j minus expectation of x j. So, under independence, this will go inside and therefore, each of the term will vanish and so you will only get the square terms will be left when you square up this minus the expectation of the sum. And therefore, you will get the sum of the variances. So, under independence, you get that result. Now, formula for the variance of sum, when the random variables are not independent will be discussed later. So, we will give you a general formula, when the variables are not independent. So, this is one thing and so we can use it for computing various expectations and so on. Now, let me discuss an interesting result which is called Boole's inequality and let me first just describe what we want to say here. This is that, if you have a 1, a 2, a n as n events and so when corresponding to these n events, I define the corresponding indicator variable. That means, x i is 1, if a i occurs i varying from 1 to n and it is 0 otherwise. So, therefore, and then I let x be the sum of these indicator variables, these n indicator variables. So, in words x denotes the number of the events a i that occur, because x i is 1, if a i occurs. So, if x for example, if capital X is 5, then that means, 5 of the a i's have occurred. This will add up to 1 plus 1 for 5 events which have occurred and define another variable y which is equal to 1, if x is greater than or equal to 1 and 0 otherwise. So, otherwise means that you see if x is 1 or greater than 1 then y is 1, but if x is 0 then y will be 0. So, from definition it follows that x equal to 0 implies y equal to 0, since your x is otherwise greater than or equal to 1. See, either an event occurs or it does not occur. So, if any one of the events occurs then x will be at least 1 and if none of the events occur then it will be equal to 0. So, therefore, x is always greater than or equal to 1 or it is 0, I mean if one of the not always, what I mean is that if at least 1 event occurs then x will be always greater than or equal to 1 otherwise if none of the events occur then x will be 0 and in that case y will also be 0. So, therefore, it implies that x is greater than or equal to y because x will take value 1 or 2 or 3 your y is 1 and whenever x is 0 your y is 0. So, it is clear that x is always greater than or equal to y and this implies that your expected value of x will also be greater than or equal to expected value of y because this means x minus y is non-negative and therefore, expectation of x minus y. Now, I am just writing the general expression here. So, that means for example, a general expression for e x minus y would be minus infinity to infinity x minus y f x y x y d x d y. So, this integrand is non-negative and therefore, the integral will be non-negative and so it follows that your e x must be greater than or equal to e y. So, that is the important result and that is how through this we will derive the Boole's inequality finally. Now, look at e x. So, e x is expectation of I take it inside being linear function. So, I can exchange the expectation and summation sign. So, sigma i varying from 1 to n expected x i, but expected x i is what? Expected s i is 1 into probability x i equal to 1 plus 0 into probability x i equal to 0 since x i is take values only 1 and 0. So, this is 1 into probability x i is equal to 1 plus 0 into probability x i equal to 0, but then so this is 0 and this probability x i equal to 1 is probability of occurrence of a i and therefore, this is 1 into p a i. So, which is p a i and so your sum of the expectations of x i is equal to sigma p a i. Now, x i is a Bernoulli and you can see that x i is a Bernoulli random variables because x i is take value 1 or 0 and the probability of success as you can call it is p a i for x i. Now, what is probability y? So, probability y is probability that at least one of the a i occurs. So, this is union a i varying from 1 to n. So, this is at least one of the a i occurs and so well I which way would you we are saying y is equal to 1 if x is greater than or equal to 1 and this translates to x is greater than or equal to 1 if at least one of the events a 1 a 2 a n occurs. And so this is probability union i varying from 1 to n a i and so expectation e y will be 1 into because y is 1 if x is greater than or equal to 1. So, this is union i varying from 1 to n a i plus 0 into a probability union a i complement right probability of this complement of this event. And so therefore, this is also equal to e y is equal to simply probability of union a i varying from 1 to n. And hence we obtain Boltzmann equality which says that sigma i varying from 1 to n p a i is greater than or equal to probability union a i i varying from 1 to n. So, in words it says that probability of occurrence of at least one event out of given n events. So, probability occurrence of at least one event is no greater than the probability or the sum of the probabilities of occurrence of individual events. So, this in words is your Boltzmann equality which you might says simple to accept or this is the sounds very very reasonable. But we had to go through this process to be able to derive this inequality right. So, this is this says at least one of the events must occur. So, probability of that and that cannot be greater than sum of the probabilities of the individual events. Now, let us further go on I want to again continue with using the whatever we have developed about adding up random variables and then computing their expectations and other results through summing up random variables. So, now look at the chi square and random variable. So, this random variable is defined as summation i varying from 1 to n z i square where each z i is standard normal i varying from 1 to n right. The expectation of z i is 0 and the variance of z i is 1. So, each of the z i is standard normal standard normal then look at the. So, we want to compute the first of all we want to compute the c d f of z i square. So, this will be let me make it clear. So, f z i square y will be probability z i square less than or equal to y and this earlier one of the earlier lectures I have already discussed. You can write this as. So, what you are saying is that your z i square should be less than or equal to y which means that your z i should be less than root y. So, it should lie between minus root y and root y your z i right. So, that the square does not exceed y and. So, this probability can be written as because you will be taking it this probability z i less than root y and then you want to subtract z i less than minus root y. So, this is the probability right and then for when we differentiate this side I differentiate. So, I will get the p d f of z i square this is z i square which will be equal to. So, from here we will differentiate. So, that the derivative of root y will be 1 by 2 root y into f of z i root y minus f of z i minus root y and for standard normal c you see what is your this thing is a standard normal 1 upon root 2 pi because sigma is 1 is equal to e raise to minus y square. So, it is root y. So, it will be root y square upon 2 because sigma square is 1. So, therefore and when you square up the minus root and root y they are both give you e raise to minus y by 2. So, this becomes. So, this is twice e minus y by 2 and this 2. So, that cancels out and. So, I am left with 1 upon root y 1 upon root y e raise to minus y by 2. Now, this I can rewrite because you have 1 upon root 2 and 1 upon root 2 I am writing as 1 upon 2 into 1 upon 2 raise to minus 1 by 2 I am doing this or. So, it is the 1 by 2 is going. So, 1 by 2 into 1 upon 2 minus 1 by 2 you write this way and this is left by. So, root 2 root 2 cancels out you are left with 1 by root 2. So, this whole thing I am writing as 1 by 2 1 by 2 y raise to minus 1 plus half root y. See because this is 1 upon 2 raise to minus half. So, 1 upon 2 raise to minus half into 1 by 2. So, this whole thing is actually equal to 1 by root 2 which appears here 1 by root. So, this 1 by root 2 I am writing in this way. And now you see if you can remember your gamma distribution then my lambda is half and my alpha is half because this is alpha minus 1 and then this is lambda y raise to alpha minus 1 e raise to minus lambda y and then lambda. So, this is my gamma of course, when you look at the pdf gamma pdf then it has to be divided by gamma alpha. So, what I am doing is I am writing this as gamma pdf. So, I am then multiplying by 1 by 2 gamma 1 by 2 because I have divided here this expression this numerator I have divided and multiplied by gamma 1 by 2. So, when I divide this by gamma 1 by 2 gamma 1 by 2 the whole thing becomes a gamma pdf with parameters half and half and I have a gamma 1 by 2 here and there is a root pi. Now, since this is a pdf on the left hand side this should also be a pdf and therefore, you see that these two must be equal. So, this implies that see earlier lecture when you was talking of we were talking discussing the when I introduce the gamma distribution I told you that we will take it take that gamma of half is root pi, but now you have a immediate justification that gamma of half must be equal to root pi. And of course, as I said that other for other fractional values of gamma of this gamma function you can the tables are there and for integer values we had already seen for positive integers we also saw that gamma alpha will be alpha minus 1 factorial and so on. So, now anyway so continue with this discussion. So, therefore, we have so therefore, each z i square has a gamma half half distribution. Now, gamma square n sorry chi square n is z 1 square plus z 2 square plus z n square and z i's are independent. So, then applying the mgf results we see that you can add up the pdf's here the parameters and you will again because each is gamma each z i square is a gamma distribution half half and they are n of them they are independent. So, therefore, the sum will be gamma n by 2 half that means the parameter lambda will be half and this will be n by 2. So, you see how I mean using all the results that we have so far obtained I thought this was a good way to show you how we use these tools that we are generating. And then you can see the break up of so once you have a gamma distribution then you can see that by adding up these independent gamma distributions gamma and variables you get a chi square n and of course, in the special way because yeah. So, this is another interesting result and so see when you have seen that when if n is an integer if n is an integer then gamma n is simply n minus 1 factorial. So, if n is even then your this thing will be a factorial of n by 2 because n is even then this is an integer and so gamma of n by 2 will be n by 2 minus 1 factorial, but if n is odd then you will be left with gamma half. So, for example, if you say n is 7 then gamma of 7 by 2 will be 5 by 2 into gamma 3 by 2 then gamma 3 by 2 is sorry 7 by 2 gamma 7 by 2 is 5 by 2 gamma 5 by 2 which will be 3 by 2 gamma 3 by 2 and then that will be half gamma half. So, which is root pi here. So, this is how you can compute. So, therefore, now you know you can compute this for all values of alpha integer non-integer you can find out right. So, this was an application of so therefore, you see first you square up independent normal standard normal variates sum them up you get a chi square distribution and you are showing is that for n this is chi square with n degrees of freedom. So, then chi square n is actually obtained by adding up gamma half half. So, I should also mention the importance of chi square distribution. So, if you take talking of chi square n distribution and we said that the n stands for the number of normal standard normal variable that you are squaring and adding up. So, you can think of because you see this is x i minus mu i upon sigma i whole square. So, this would be your z i and your z i square. So, you are summing up and so this can be treated as this can be looked upon as you know to an attempt to estimate the errors involved when one attempts to hit a target in n dimensional space when coordinate errors are taken to be independent unit normal random variables and right. So, this is you know you can difference from the mean or whatever kind of error you want to you know talk about and get about talk about their distribution and so on. Then chi square random variables are coming very handy in that and in fact it is a most widely used distribution in statistical analysis. So, chi square distribution has lot of importance and very very often used for your statistical analysis. .. So, now let me get back to sums of independent normal variables and this is if x i's are independent normal random variables x i's i varying from 1 to n are independent normal random variables with respective parameters mu i sigma i square for that means for the x i'th normal random variable the mean is mu i and the variance is sigma i square i varying from 1 to n. Then sigma x i is again normally distributed with parameters sigma mu i and the mean as sigma mu i and the variance as sum of the individual variances because the random variables are independent. So, anyway that we know already that the variance for this would be sigma i square and we also know that the mean for sigma x i will be sigma mu i that these results we have already done, but to show that the sum will again be normally distributed that is the important thing. Now, here again I am going to use see the thing is that I have been using MGF's moment generating functions to talk about summation of independent random variables, but in text sometimes they introduce the concept of moment generating function much later. So, then they actually do it through the you know writing the joint density function because these are independent random variables. So, the joint density function will be product of the individuals and then they manipulate the term and actually come to the result. So, may be you should also do that to get you know better feeling, but I find that the treatment through MGF is very convenient. So, see the MGF of x i is e raised to mu i t plus half sigma i square t square this is for the normal mu i sigma i square and since x i's are independent I varying from 1 to n. MGF of sigma x i is the product of the individual x i moment generating functions. So, this we have done already and so the moment generating function of sigma x i would be e raised to mu 1 t plus half sigma 1 square t square and then e raised to mu 2 t plus half sigma 2 square t square and so on up to n and therefore, you can add up because these are the powers. So, e raised to sigma i varying from 1 to n mu i t plus half sigma i varying from 1 to n sigma i square t square and so this is again as I said that the uniqueness of the that means given this MGF I can immediately conclude that the corresponding distribution is normally distributed with mean sigma i varying from 1 to n mu i and this is variance sigma i varying from 1 to n sigma i square. So, using the MGF you can get these results much quicker otherwise you have to though the other route is also not difficult one it is just that you have to write out these long expressions and then show that the sum of independent normal random variables will be again a normal random variable and these will be the parameters. So, to give you an example about how to make use of the fact that sum of independent normal random variables would also be normally distributed and let us look at this example from Shadlen Ross this is a football club team will play a 44 game season. So, you know during summer they all play games to different teams. So, they are 44 games that particular team will be playing. So, 26 of these with will be with class A teams and 18 the remaining 18 games will be with class B teams. So, now probability of winning a match against an A team is 0.4 because A teams are better than the B teams and probability of winning a match against a B team is 0.7. So, results of different games are independent we are not assuming that there will be any sort of dependence in winning of a game with one team and the other. So, we want to the ideas to approximate the probability that the team wins 25 games out of those 44 games played and the second probability that you have to compute is that the team wins more games against class A teams than class B teams. So, let us start by defining the random variable x A as the number of matches one against class A teams and x B is the number of matches one against class B teams. Now, of course x A and x B are binomial random variables because win is a success and the probability of success is 0.4. So, you can find out out of 26 games played by this team with class A teams and the number of that means if x A is equal to R then you will find out it will be binomial probability and similarly x B is also a binomial random variable. Now, expectation of x A will be 26 so N P the formula is N P. So, 26 games played probability of winning a matches 0.4. So, this is N P so that comes out to be 10.4 and the variance is N P Q. So, which will be 26 into 0.4 into 0.6 is 6.24 then similarly x B being binomial with parameters 18 and 0.7. So, the expectation of x B will be 12.6 variance x B equal to N P Q which will be 3.78. Now, the idea that we start approximating remember when I told you about approximation of binomial distribution by the normal distribution the condition was that I mean it said that if N P Q is greater than or equal to 10 then the approximation is considered good. But, here of course that condition is not being satisfied because N P Q in this case is 6.24 and in this case it is 3.78. But, still we are going ahead with the approximation just to get an idea because I want to show you the application of adding up normal variance. So, required probability is that x A plus x B together the number of matches 1 against class A teams and the number of matches 1 against class B teams they must add up to more than 25 or more than 25. Now, even though I am approximating x A and x B binomial, but they are discrete random variables they are binomial and so the continuity correction factor must be used here. So, this will be since this is greater than or equal to 25 this will be 24.5 because remember you have this on 25. So, your bar is like this. So, the bar starts from 24.5 and you want to approximate this you want to include this area because the probability here is greater than or equal to 25. So, it will be 24.5. So, probability x A plus x B greater than or equal to 24.5. Now, I will standardize this probability by subtracting the mean which is 23, 12.6 and 10.4 add up to 23 and the two variances add up to 3.7 and 6.24 is 10.02. So, this is what you have now this becomes a standard normal variate and therefore, this probability z greater than 1.5 divided by square root of 10.02. So, this will be 1 minus 5 of this number comes out to be 0.4739. So, the required probability is 1 minus the normal table the standard normal probability of this number. So, this is 0.3178. Now, x A minus x B is also approximately normal minus 2.2 and 10.02 as the variance. Therefore, probability x A minus x B greater than or equal to 1 because you want the probability that matches against class A teams matches 1 against class A teams is more than the matches 1 against class B team. So, therefore, the difference must be greater than or equal to 1 it can be more and so here again we standardize. So, this is x A minus x B minus of minus 2.2 which becomes plus and this is under root 10.02 which is greater than or equal to 0.5. So, here again these continuity correction factor is used. So, you subtract 0.5 from here. So, that becomes 0.5 plus 2.2 upon this under root of 10.02. So, that is z greater than or equal to 2.7 upon under root of 10.02 which is comes out to be this from tables you look up the value which is 0.2. So, 1 minus of that will be 0.1968. So, this is the probability. So, the fact the probability is low of winning more matches because obviously this probability is much lower compared to the probability of winning a match against a B team. So, as we go on more and more examples of all these concepts that we are talking about. Now, let us come back to sums of independent Poisson random variables. So, x is Poisson lambda 1, y is Poisson lambda 2 then and x and y are given to be independent random variables. So, let us look at the distribution of x plus y. So, now the M g f since x and y are independent M g f of x plus y will be the product of the M g f of x and y. So, M g f of a Poisson lambda 1 is e raise to lambda 1 into e raise to t minus 1 and M g f for y is e raise to lambda 2 e raise to t minus 1. So, therefore, this adds up to e raise to lambda 1 plus lambda 2 e raise to t minus 1 and this is a Poisson lambda 1 plus lambda 2. So, therefore, you immediately get the result and as I told you even earlier that you might try to do it directly. That means, you may obtain the cumulative density function for x plus y distribution function for x plus y and then from there you can compute. So, having learnt the trick to use M g f for finding out the distributions of sums of random variables, I will still write it down for binomial and Poisson and so on. I think Poisson we have already done it. So, now let us look at the sum of independent binomial random variables. So, here again b x is binomial n comma p and y is binomial m comma p. Then we want to look at the sum and x and y are independent. So, then again M g f of x plus y will be the product of the individual M g f. So, here it is p e raise to t plus 1 minus p raise to n and for the random variable y, the M g f is p into e raise to t plus 1 minus p raise to m. And so, when you multiply the powers get added up and so this is p e raise to t plus 1 minus p m plus n. And therefore, it immediately follows that x plus y is binomial n plus m comma p. So, if the probability of success is the same, then the if you are looking at two random variables in one case, the number of trials is n. In the other case, the number of trials is n. Then the sum will again represent the binomial random variable, where the number of trials get added up. So, probability of success remains the same. So, now, once having seen this thing for some of these distributions, one can now you can whenever you come across something new, you can read it up and understand what is going on. Now, let us look at the conditional distributions also. We have looked at conditional probabilities, we have looked at Bayes conditional probability and so on. So, now, let us look at conditional distributions. So, remember that if when e and f are two events, then we define the conditional probability of event e given that event f has occurred. And this was defined as probability of e intersection f that means both the events must occur divided by the probability of occurrence of f. So, this was for the events. Now, when you come to x and y are two discrete random variables and you want to write down the conditional probability of x given y. So, where capital Y is let us say small y and x is x small x. So, if you want to compute this, then it will be again just borrowing it from here, it will be probability x equal to x y equal to small y divided by probability y equal to y. So, which you can write in your notation as p x comma y on probability y equal to small y. Now, sometimes I may write this suffix, sometimes I may not. So, it does not matter, but you understand from the context that this is for the single variable and this is for the joint p m f. So, this is for all y that means this conditional probability is defined for all y such that probability y is greater than 0. Remember I am dividing by a number which I must ensure is positive, which is non-zero and since probabilities cannot be negative, so the number must be positive. So, for all possible values of y for which there is a positive probability, I define it this way. So, conditional p m f of x given this. So, therefore, this defines the conditional p m f of x given that y is equal to y. Now, cumulative conditional distribution function of x given y is equal to y would be you know f x given y. So, that will be probability x less than or equal to small x given that y is equal to y, which is then probability x equal to a given that y is equal to y and you are summing up over all a for which a is less than or equal to x. So, therefore, this is the notation conditional notation this is the conditional probability of a given y where a is less. So, you are summing up over all a less than or equal to x. Now, if x is independent of y then we know that this probability the conditional probability will be written as because here this will be the product of the two and divided by probability y equal to y. So, therefore, it will reduce to probability of x equal to x. So, therefore, we just trying to show you that whatever we did for the events the same thing goes over for the random variables and their corresponding distributions. Now, just look at this example if you are given that probability of 0 0 is 0.3 that means, x taking the value 0 y taking the value 0. So, therefore, your x takes the value 0 1 and your y takes the value 0 1. So, therefore, there are 4 possibilities probabilities p of 0 1 is 0.3, p of 1 0 is 0.2 and p of 1 1 is 0.2 they all must add up to 1. So, you want to calculate the conditional p of f of x given that y is equal to 1. So, first of all you need probability of y equal to 1 in the denominator. So, therefore, this is equal to p of 0 1 plus p of 1 1 that gives you the marginal of y which is a probability of y equal to 1. So, that is 0.5 and hence conditional probability of x given y equal to 1. So, if you want to find out then you see the possible values of x are 0 and 1. So, you will find out both the probabilities conditional 0 given 1 y equal to 1. So, that will be 0 1 divided by y equal to 1 probability of y equal to 1. So, 0 1 from here is 0.3 divided by 0.5 and that is equal to 3 by 5. Similarly, probability x equal to 1 when y is given to be 1. So, that will be p of probability 1 comma 1 divided by probability y equal to 1. So, it should be 0.2 divided by 0.5 and so this is 2 by 5. And so similarly you can compute then the p m f of well. So, this is conditional. Now, you can fix the value of x and then compute the conditional p m f of y. .. Yes, now there is another interesting example. This says that if x and y are independent Poisson random variables with respective parameters lambda 1 and lambda 2 calculate the conditional distribution of x given that x plus y is n. So, now the condition is on the sum x plus y equal to n and you want to find out the conditional distribution of x. So, first let us compute probability x plus y and as I just showed you just a few minutes ago that if x and y both are independent Poisson random variables their sum will be also Poisson and the parameters will get added up. So, this is e raise to minus of lambda 1 plus lambda 2 lambda 1 plus lambda 2 raise to n divided by n factorial. So, this is easy because we have already seen the distribution for x plus y. Then you now want to compute the probability for example, x equal to k when x plus y is n. So, first finding the conditional probability of x given that x plus y is n. So, now if x is k then this says that your y must be n minus k. So, I mean here you will be writing probability. So, x equal to k and x plus y equal to n that will be the product and then divided by probability x plus y equal to n. So, intersection of x equal to k and x plus y equal to n is equivalent to the event that x is k and y is n minus k divided by probability x plus y equal to n. And since again x and y are independent this probability I can write as a product of individual probability. So, this will be probability x equal to k into probability y equal to n minus k divided by probability of x plus y equal to n. And this both being x and y both being poisson this is e raise to minus lambda 1 lambda 1 raise to k divided by k factorial. Then the other probability is e raise to minus lambda 2 lambda 2 raise to n minus k divided by n minus k factorial. This may not look very. So, let me rewrite n minus k factorial. You still cannot read it, but it is there. And then e raise to lambda 1 plus lambda 2 probability of x plus y equal to n which we wrote down here. So, this is lambda 1 plus lambda 2 raise to n. Then there should have been e raise to lambda 1 plus minus minus e raise to lambda 1 plus lambda 2 and the n factorial goes to the numerator. So, therefore, collect these terms n factorial divided by k factorial and n minus k factorial that comes here. And then you see here this is lambda 1 plus lambda 2 raise to n and you have lambda 1 raise to k and lambda 2 raise to n minus k. So, this I break up into lambda 1 plus lambda 2 raise to k into lambda 1 plus lambda 2 raise to n minus k. So, then I get the terms lambda 1 upon lambda 1 plus lambda 2 raise to k and lambda 2 upon lambda 1 plus lambda 2 raise to n minus k. And you see these two numbers that means lambda 1 upon lambda 1 plus lambda 2 plus lambda 2 upon lambda 1 plus lambda 2 this adds up to 1. So, if I denote this by p then this number is 1 minus p. So, in that case this then this looks like. So, that means the conditional probability x equal to k given that x plus y is n is binomial. So, therefore, different values of k you will get these probabilities which are exactly the binomial probabilities for the parameters n and your probability of success is lambda 1 upon lambda 1 plus lambda 2. So, I think throughout this course I have been trying to show you that even though you have these different random variables you how you can through processes of addition conditional and so on you can see the connections between the various distributions here. And therefore, you know that makes the things more interesting and of course, very useful also.