 Welcome back. So, we were discussing conditional PMF. We said if x and y are discrete random variables conditional PMF p x comma y of x sorry p x given y of little x given y is defined as p x comma y of little x comma little y over p y of y right. This is just probability that x is equal to little x y is equal to little y over probability of y is equal to y. Assuming so this all this assumes that this is positive. So, x and y are discrete random variables. So, they take some countable set of values right. And what you are doing essentially is that you are fixing let us say a value particular value y right and asking what condition on y equal to y what is the conditional PMF of x right. So, you are just scaling this joint for the given y divided by. So, by the probability that y equal to y right and if you are given a joint PMF you can find the marginal by just adding over all x right. So, this guy is simply sum over all x of p x y of x y right that is the denominator. So, if you are given the joint PMF you can find the conditional PMF. And since this is this has to be positive only then is the conditional PMF defined. So, we discussed this last class are there any questions on this. So, I will now state a theorem let x and y be discrete random variables then the following are equivalent all x y n r such that p y of y is greater than 0. We have p x given y of y equals p x of x. So, this theorem is actually a fairly simple theorem to prove right this it gives equivalent conditions for discrete random variables x and y being independent. So, we have already defined independence of random variables x and y as this corresponding sigma algebras generated sigma algebras being independent. So, we know what this means and this for the specific case when x and y are both discrete this is giving further condition for the equivalent conditions which are easier to verify often for discrete random variables rather than go hunting for sigma x and sigma y and prove that they are independent sigma algebras. The equivalent conditions you can verify which is enough to say that x and y are independent. So, the following are equivalent meaning that. So, the 4 statements I have said are equivalent statements. So, what does it mean? So, what the first of all if I say 2 statements are equivalent what is the what does it mean? So, if I say statement a and statement b are equivalent a implies b and b implies a correct. So, if I say that 2 statements are equivalent I have to prove 2 results this direction and this direction right. So, here I am saying 4 statements are equivalent which means I should prove how many statements 12 statements I have to prove right. Is that true? So, I for. So, I can pick any 4 choose 2 ways of picking these statements right 6 statements and I have to prove 2 theorem both ways right. So, I have to prove 12 theorem there are 12 results in this right, but often you do not have to that is what it logically that is what I have to do right. You have to prove that this implies this, this implies this right and this implies that the third one and so on right. You have to exhaust 12 statements, but often what you can do is you can prove a circular 1 implies 2 implies 3 implies 4 implies 1 right. That kind of a thing is also fine right. Often you may be able to get away with verifying a smaller subset of these 12 statements right. So, in this case actually see some of these are trivial right the equivalence of 2 and 3 is quite trivial right right. So, we are saying this says that the event x is equal to little x and y equal to little y are independent events and the equivalence to the PMS factorizing is very obvious right. It is almost like from the definition it follows straight from the definition right. In fact, the equivalence of 2, 3 and 4 is very easy right that I will not do it is very easy to verify. Proof equivalence of 2, 3 and 4 is easy I will not do it you it is almost from the definition. So, if you buy that 2, 3, 4 are equivalent how many statements do you know what you have to prove? It is enough to prove that 1 implies any one of these guys right any one of these guys and conversely if you assume any one of these 2, 3, 4 it implies 1 that would be enough right. Any questions? So, let us prove that 1 implies how do you prove 1 implies 2. So, you are assuming that x and y are independent right and you want to prove that these 2 are independent events right how do you prove that. So, the sigma algebra. So, if x and y are independent sigma algebra generated by x and y are independent sigma algebra which means for any Borel sets for any Borel B 1 and B 2 we have p of x belonging to B 1 and y belonging to B 2 is equal to p x belonging to B 1 times p y belonging to B 2 y belonging to B 2. So, these are Borel sets in R these are Borel sets in. So, they are for any Borel sets B 1 and B 2 in R you must have this right this is what independence of sigma algebra means correct. Now, how do you know what you do next? Take singleton take B 1 as the singleton x and B 2 as the singleton y right. Now, take B 1 as singleton x and B 2 as singleton y little y right that will be the end of it because then you will have probability that x is equal to little x and y is equal to little y will product out which means that these 2 are independent events right. See when I write comma for intersection that I already mentioned I think right and that is it proofs over right. So, one this direction is very easy now I have to prove that one of these right we assume either 2 or 3 or 4 you get 1 right. So, assume that now let us say 2 holds right then you have to prove 1 alright. So, we proceed as follows again B 1 B 2 or Borel sets in R Borel sets B 1 B 2 we have probability of x belongs to B 1 and y belongs to B 2. So, this is equal to. So, I am going to take. So, this can be written as a summation is it not? Summation x belongs to B 1 y belongs to B 2 probability x sorry x is equal to little x y equal to little y and you are summing over all. So, you are looking at the probability x is in B 1 and y is in B 2. So, you are looking at you are summing over all the masses that are sitting inside B 1 and B 2 respectively correct. So, that relation is. So, this is just saying that if I know the joint PMF I know the joint when it is like saying I know the joint law right except the Borel set here is of is like a Cartesian product of 2 Borel sets on R right B 1 cross B 2 now. So, this is equal to I am assuming 2 is not it. So, this is equal to. So, this will be equal to probability of x is equal to x times probability of y is equal to y because I am assuming 2. Now, this will become summation x belongs to B 1 probability x is equal to x times summation y belongs to B 2 probability y equal to y. How does this work? Because see now you have factorized right. So, this x summation will only hit this and y summation will only hit that right clear and you have what you want right. So, this is equal to probability that. So, this x is a discrete random variable now right. So, this is equal to what probability that times probability that y belongs to B 2 correct. And this is true for any Borel sets B 1 and B 2 right which means the sigma of x and sigma of y are independent sigma algebra which means x and y are independent. So, this is true for all Borel B 1 and B 2 implies x y are independent. So, are there any questions on this yeah. So, what we have said is that if you want to for if you want to talk about independence of discrete random variables x and y it is good enough to verify that the joint P M F factorizes into the marginals right. That is equivalent to saying x and y are independent random variables and conditional P M F is something you can calculate. If you are given only the joint because from the joint you can get the marginal and divide 1 by the error you will get the conditional P M F. And this condition simply says if you are given if the conditional P M F is same as the unconditional P M F also saying x and y are independent they are equivalent. Very easy stuff right this is as elementary as it gets discrete random variables are very easy. Now, if you move on to continuous random variables. So, in the discrete case we said if x takes if x is discrete then it takes only countable values on r with probability 1. Similarly, y takes countable set of values on r with probability 1 then the pair x comma y necessarily takes only countable values on r 2 that is because cartesian products of 2 countable sets is necessarily a countable set. However, in the countable case in the continuous case this is not so simple. If x is a continuous random variable and y is a continuous random variable it need not be true that x and y are jointly continuous. First of all I have not told you what jointly continuous is which I will, but what you suspect it will be remember we said that we said x is a continuous random variable if the measure P x is absolutely continuous with respect to Lebesgue measure on r. In other words all length 0 sets have 0 probability measure right. So, if you are talking about random variables x and y mapping to r 2 mapping omega to r 2 what should the equivalent definition of a continuous jointly continuous x y b exactly. So, it should just be that. So, you have 2 measures on r 2 which is your P x comma y the joint law P x comma y induced by the random variables x y and you all be obviously have Lebesgue measure which is area on r 2. Now, if for every 0 area set 0 Lebesgue measure set on r 2 you have the joint law is 0 right. Then you say x comma y is a jointly continuous jointly continuous random variables right, but what I am saying is that it is not true if x is continuous and y is continuous it is not necessarily true that x comma y is jointly continuous. It is in the discrete case if x and y are separately discrete they are jointly discrete right. So, let me define this. So, you have just to remind you you have a probability space omega f p and you have your x comma y mapping omega on 2 r 2. And we said that if x is x and y are random variables then all Borel sets on r will have pre measures which are f measurable that will actually be a home work you have to prove this right that the pre measures of Borel sets on r 2 are necessarily f measurable. Now, what I am saying is that. So, there is 2 measures here one is the measure pushed by the random variables which is P x y the other measure is Lebesgue let us call it lambda for the whatever better name. So, what we want is the following definition x and y are jointly continuous if the joint law P x y well P x y is absolutely continuous with respect to the Lebesgue measure on r 2 strictly speaking I should say the Lebesgue measure on the Borel sigma algebra on r 2 right. Absolutely continuous means that if you take any Borel set on r 2 which has 0 area 0 Lebesgue measure then the corresponding probability law P x y must also be 0 right is that clear. So, if you give me any Borel set n such that lambda of n is 0 then I must have P x y of n is also 0 any questions. So, this is a repetition of what we said in the one dimensional case right caution if x and y or continuous random variables x and y need not be jointly continuous. So, if x and y are separately discrete it necessarily means that the joint measure x y will also take only countable values right. So, x and y are jointly discrete, but in the continuous case if x and y are separately continuous it does not mean that x comma y is jointly continuous can you see why roughly you are right you are talking about. So, x and y are separately continuous means that their joint see their marginal laws are absolutely continuous with respect to Lebesgue measure on r right, but that is not imply that the joint laws absolutely continuous with respect to Lebesgue measure on r 2 right. So, if you can just take any simple example you want let us say. So, let us say x is Gaussian remember this Gaussian with mu equal to 0 sigma equal to 1 right. So, this is denoted by n 0 1 remember this example. So, this will you can take anything you want right this I am just giving one example let us say and y let us say let y equal to 2 x. So, x is some standard Gaussian you can take standard exponential for all you for all for all I can take any continuous random variable you want and take y equal to 2 x y will also be a continuous random variable. In fact, it will be Gaussian with parameters 0 and 4 you can show we will see that later. So, then y is distributed like n 0 4 which is also a continuous random variable right, but what you see is that this guy this measure exist. So, this is just sample space omega, but your measure exists only on the line y equal to 2 x right. All the measure is sitting on the line right all omegas here necessarily map to that line y equal to 2 x see y. So, x of omega is whatever it is y of omega is necessarily twice of x of omega by definition right. So, all the measure will exist on that line with me, but this line has 0 Lebesgue measure it has 0 area right the line has 0 area you can actually show it rigorously if you if you can consider the line as some countable intersection of small rectangles and so on and prove that this line has 0 Lebesgue measure and. So, you have all the probability measure of 1 sitting on a set of Lebesgue measure 0 on r 2 right. So, this is an example to show that even if x and y are separately continuous random variables they need not be jointly continuous according to a definition right. So, saying that x and y are jointly continuous is a strictly stronger condition than x and y are. So, what we will show is that if x and y are jointly continuous they have to be marginally continuous as well that is something we will show very soon. So, joint continuity is a stronger condition than being marginally continuous x and y being marginally continuous is that clear any questions no see y is it is true see x is some Gaussian and y is is equal to 2 x it is also Gaussian y is also Gaussian you can show. What I am saying is that this is your sample space all omegas necessarily map to this line correct that is clear to everybody right because if x of omegas whatever it is y of omegas has to be twice x of omegas. So, all your measure has to sit on that line which has Lebesgue measure 0. So, it is not jointly continuous. So, now suppose that x and y are in fact jointly continuous random variables which means that p x y is absolutely continuous with respect to lambda. Now, what can you say now you can invoke what can you invoke now you can invoke Radon Nicodem theorem right Radon Nicodem theorem holds for very arbitrary measure spaces with sigma finite measures. So, it is a very general theorem that is what I mean to say. So, that so p x y can be written as the integral of some non negative measurable function d lambda on r 2 right. So, Radon Nicodem theorem will imply the following Radon Nicodem theorem right that. So, if x comma y are jointly continuous random variables Radon Nicodem theorem will imply that there exists measurable function non negative measurable function f x y which maps r 2 to 0 infinity such that for any Borel B on r 2 we have the joint law p x y of B given by the integral over B f x y d lambda this is lambda is on Lebesgue measure on the plane r 2 actually if you if you prefer you can just write this as d x d y there is nothing wrong in writing it as d x d y right. So, I mean again this is an integral you will not fully understand now this is a Lebesgue integral. Again you can just think of B as being some box right if you want some rectangle as well that the probability of x and y mapping to a box is integral over that box f x y d x d y and this is saying there exists such a non negative function. And in particular taking what you do now you will take generating class right. So, what is the generating class for Borel sigma algebra on r 2 semi infinite rectangles right. So, taking B equal to minus infinity x cross minus infinity y oops infinity y we have then p x y of a set like that will be nothing, but what the joint c d a is not it we have f x y of x comma y which is nothing, but the probability that x is less than or equal to x y is less than or equal to y equal to integral minus infinity to x integral minus infinity to y f x y d y d x I should write may be d f x y of x y f x y of some let us say r comma s or something like that because I cannot use the same variables f x y of s comma t d t d s that does make sense I am not writing d x d y because I mean that is that guy sitting outside right I cannot integrate with respect to same variables. So, red on equilibrium theorem affirms that the c d f can be written as the integral over the semi infinite rectangles of some non negative function again just like in the single dimensional case this has this is called probability density function except now it is a joint p d f joint probability density function. So, this is called joint probability density function of x and y it is some non negative measurable function again it has no interpretation of probability whatsoever only if you integrate it over a Borel set do you get a probability. And also this just like in the one dimensional case this p d f is only uniquely specified up to a set of Lebesgue measure 0 are there any questions on this. So, this is a point you will appreciate later. So, I was just saying that see the c d f probability law can be written as the integral of some non negative measurable function with respect to the Lebesgue measure all I was saying is that this joint p d f is uniquely specified only up to a set of Lebesgue measure 0 you change the function on some 1 or 2 points or a countable points or some 0 measure point does not really change the integral as we will see later. So, you can it is unique up to a set of measure 0 uniquely specified up to a set of measure 0. So, when I mean so when I mean measurable f x y that means that pre images of Borel sets on R must be Borel sets on R 2 that is all I mean I mean this you will understand later you can only integrate see when we do integration theory we will integrate a measurable function with respect to a measure over a measurable set we will do that later. So, which is why this you put a question mark we will get back to it in about 2 weeks, but this you understand this guy is just your indefinite Riemann integral fine. In fact, in this you can also it is also true that the order in which you integrate will not matter, because this is a non negative function then it can be shown that the order in which you integrate does not matter you can jolly well just write d s d t integrated in the opposite order. Now, so if I give you the joint p d f you can integrate it and find the joint c d f, but if you given the joint c d f you can always find the probability law which means that if I specify the joint p d f it is a complete characterization of 2 jointly continuous random variables. Are there any questions on this now what you can show also is that. So, you have that relation. So, I can now show that if I just consider the probability that x is less than or equal to x suppose I am only interested in probability that. So, I just want to compute this let us say which is the probability that x is less than or equal to x say x and y are jointly continuous. So, what will I do I know the joint c d f and you know theorem about joint c d f if you send y to infinity you get the marginal c d f of the other guy x. So, if I send y to infinity in this relation. So, this is this we can this we have shown already if I send y to infinity I will get the joint I will get the marginal this is x comma y I will get the marginal of x. This is from one of our properties of joint c d f this has nothing to do with continuous random variables, but now I am going to invoke this relation. Now, I am going to use the fact that x and y are jointly continuous. So, then I am going to write. So, I send limit y to infinity as I said you can just integrate it to any order you want I will not prove that, but just take it from me that if you have a non negative function you can change the order of integration nothing happens. If this guy can be positive as well as negative then you have to be very careful about interchanging the orders of integration. You cannot always integrate in any order you wish, but this guy is non negative. So, there is a theorem known as fubini's theorem which we will not do in this class which allows you to integrate in any order you wish. So, you can write this as integral write this as all that f x y of s t d t d s I am just putting y equal to infinity I am being a little bit loose I am sending. So, what I am doing is I am sending limit inside one of the integrals. All this you can do because f is non negative normally you cannot do all this, but here you are here you are pushing limit into the second first integral. Now, what happens? So, if you look at. So, let us look at. So, my x variable is s my x variable is s and my first integral is hitting t and t is going from minus infinity to infinity. So, if I simply just consider that bit the t integral alone what remains will be a function of s and I am integrating a non negative function over minus infinity to infinity. So, what I must get is a also non negative function it will be a non negative measurable function. If you integrate a non negative measurable function you get another non negative measurable function. So, this is some non negative measurable function which means I can write the c d f of x as the integral from minus infinity to x of some non negative measurable function of s which means that x is a continuous random variable. So, you can call this let us call that some let us say let us say that is g of s right g of s d s right and g of s is non negative and measurable. So, and I am I am saying that I can write f x of x as the integral of a non negative measurable function which means that x is a continuous random variable right. Because in fact, rayon equilibrium theorem is an if and only if statement right if p x is continuous or nu is absolutely continuous with respect to nu if and only if nu nu of b can be written as the integral f d mu. So, if and only if statement. So, this implies x is a continuous random variable because it is c d f can be written as the integral of a non negative measurable function. This is after all I mean I should not be I mean I can only call this f x right. So, I can now that I know it must be the p d f of x right I can just call this f x of s right. So, if I am given the joint probability density function I can get the marginal probability density function by integrating out the other variable from minus infinity to infinity. So, where so maybe here I should say it where f x of s equal to integral f x of s is equal to integral minus infinity f x comma y of s comma t d t right. That is what I am calling f x of s and because my c d f is into expressible as the integral of f x of s this must automatically be the probability density function of x the marginal probability density function of x. So, if x and y are jointly continuous random variables it is the case that they are marginally continuous we have just proved it right. But of course, we know that if x and y are marginally continuous they need not be jointly continuous that we have we have a counter example. So, being jointly continuous is a strictly stronger condition than the random variables being marginally continuous is that clear. So, you said other way of saying if it is jointly continuous it need to be marginally continuous take this marginally continuous need not be jointly. Yes that is correct exactly correct all right any question so far how much time do I have I have 5 more minutes. So, next we will discuss the independence of two jointly continuous random variables. So, we know that see we know that independence of any two random variables is equivalent to the c d f factorizing right that we have established already. So, which means that so if x and y are jointly continuous and independent we must have f x y of x y must be f x of x times f y of y. But the seed joint c d f itself can be written as an integral right and the marginal c d f can also be written as the integrals right. So, what we can show is that if x and y are jointly continuous and they are independent you can show that the joint p d f must necessarily factorize. And conversely you can show that if the joint p d f factorize x and y are independent that seems perfectly reasonable right. So, for jointly continuous random variables so that is something we will start in the next class I do not want to start it now I will stop here thanks.