 Now, next we are going to talk about how to handle a set of random variables. So far we have told a random variable X, what we just defined it CDF, PDFs, if it is continuous and then talk to how to compute its expectation, variance and all these things. But often it is not that one variable you have to deal with, you have to deal with multiple variables, multiple random variables, the simple thing we have already discussed is if you have a two dice, first one outcome you want to represent by X1 and the second one you want to represent by X2, you have already two random variables here and it may so happen that one of the dice is biased, it is not necessary that it is going to take all the outcome uniformly likely. So, maybe with one, it with probability 0.9 it may take 1 and rest of the outcomes it will take only with 0.1, your other dice could be uniform, it can take all the possible outcome equally likely. So, then in this case you want to see the outcome, the joint outcome, what came when you throw both of them simultaneously, you have to jointly characterize what is the outcome or you think that it is enough like I characterize the distribution of X1 separately and I will characterize the outcome X2 separately, then when I have to look at the outcome jointly, if I just separately characterize them that will tell me how the joint one look like, it is not necessary, right, like the joint one need not look necessary if I just know the individual one. So, let us, we will make this more clear. So, for that in any experiments where we have multiple random variable we need to define the joint distributions and now we are going to study that joint, redistributed. So, let I have 10 random variables which I am going to denote like this, all of them defined on the same probability space, then we are going to define joint cumulative distribution function. So, now I am not looking at one random variable, but I am looking at now m random variable and the outcome of them may be like joint, they are dependent on each other, so that is why I need to give a joint distribution here and how I am defining it, the CDF, the cumulative density function of that, cumulative distribution function of that at point x1, x2, xm, I am going to take it as probability that x1 is less than or x2, x2 is less than or x2 and all the way up to xm less than or equals to xm. So, this is what I call my joint CDF and just to visualize this, suppose I have a simple case of two random variables, let us say x1 and x2 and I want to ask the question, what is the probability that these values take in some region R and that region R could be, let us say I am in a two dimensional space, that region R could be simply a rectangle. For example, let us say this region R could be some rectangle here, in which case this I can write it as, let us say I am not including this boundary here, let us say A, let us call this AB, let us call this A prime B and let us call this AB prime and this is going to be what? A prime B prime. So, let us say this is going to be A into B into A prime into comma B prime. So, this interval and then this interval. So, you understand this, I have x1 and x2, let us say both of them taking real numbers. Now, I want to ask the question, this is entire 2D dimensional. Now, I may be asking the question, what is the probability that they are going to take value in some particular region R and that region is here rectangle. I have to represent this, this is going to be AAB is what, they are writing the way I have written it, this is going to be what? So, this is this interval is going to be A A prime and this vertical is going to be BB prime. So, now you want to ask the question, what is the probability that this random variable x1. So, let us say my x axis represents x1 and my y axis represents x2. What is the probability that they will fall in this region? So, how you are going to find that? So, one possibility I can think of, so if I want to now find let us say probability that x1, x2 belong to R, what I can do? I can look for a probability that I take value below this and then I can look for value that is taken below this region. So, when I do this, I am subtracting this region twice, maybe I have to add it back. So, let us write this. So, if I do this, is this correct? If I do this, this is going to be x1, x2 and what is this? This is going to be the region I want to go all the way up to A prime and this is going to be B. So, first I will take the entire thing, everything below this. So, this is going to be A prime, B prime minus F. I am going to all this below, all the way is going to be A prime B and then F, all the way up to that first coordinate is going to be A and the height is going to be B prime F of AB. So, if I do this, I am only going to have this area and now this is going to characterize that probability. So, the point here is when I have this set of random variable, I want to ask the question jointly, what is the probability that this belongs to this area? I have to further, so to express that I need to know the joint CDFs also. I mean, this is like, I have taken a special case just for visualization, r is like a rectangle. You can take any shape and in that case to find this, you need to have this CDF and that is why to understand how this set of random variables are, to characterize the set of random variable, I need to define this joint CDFs. Now, another thing is to, so if I am just go and write it as F of x1, just let us say for two random variables I have this, I am going to write it as F of x1. This is going to be what probability, the way I have defined it, this is going to be x1, x2, x2. Suppose I let x2 go to infinity. So, in this case, this second part, let x2, x1 are equals to infinity, that is always holds like x1, x2 has to, so in that means this probability that x2 spanning all its value is already covered, then this simply turns out to be, what is the probability that x1 is less than or equals to x. So, that is why we say that F of x1, x2, x1 plus infinity, that is, it has x2 goes to infinity is for x1, this x1, x2, we are simply going to denote it as F of x1 of x1. So, by letting one of the variable go to infinity, we have recovered what? F of x1, in this case the marginal CDF from the joint, this was a simple case with two random variables, from the joint we have recovered the marginal CDF of x1. So, similarly you can plug in for any given x2, you can let x1 go to infinity, in that case what you will recover? You are going to recover F of x2 by letting x1 go to infinity. We are going to say random variables are, so this definition I have said are all on the same space and the joint cumulative distribution function is this, this I did not specify anything like these are discrete or continuous random variable. Now, I am going to say that, here I had only say that they are jointly distributed. Now, I am going to say that random variables are jointly continuous if there exist function which I am going to call, do you know, called and now I am going to call this as called jointly joint probability function here I do not need to say this is such that if you can express this F of x1, F of x2 all the way up to x of n, x1, x2, x of n as an integral of and this is like not like integration over one variable but integration over multiple variables here, this is going to be. So, this is just like a generalization of what we mean by continuous random variable to jointly continuous random variables. For a continuous random variable how we have defined if we said that x is a continuous random variable if it is associated CDF can be expressed as integral of some function F. So, that is our, so we said that if there exists a function F such that F of x of x is equals to integration of minus infinity to x, F of xu du then that axis is continuous but we are just now extending that notion to m random variables. If I have such a multivariate function here I am calling it multivariate because it is now taking m variable instead of one variable and if I can express this CDF which is the joint CDF in this fashion then I am going to call this set of random variables are jointly continuous. And from the joint continuous PDF, so from this joint PDF we can again go and recover the marginal PDF as we did earlier. So, what we can do is suppose now we know that F of x1, so let us say for the simple case of two random variables x1 and x2, how did we represent this in terms of the joint distributions we had said of x1, x2, x plus infinity. Now, plugging suppose let us say x1 and x2 are jointly continuous, suppose they are then according to my definition there exist F of x1, x2 which will satisfy this condition. So, then there I have minus infinity to x1 minus infinity to plus infinity and what is this F of x1, x2 of u1, u2 which is d1. Now, if you look at to this integral here, now I am integrating this function over all u2, is this whatever this in the square bracket, is this a function of u2 anymore? No, right, because I have integrated it over all possible. So, let us call this as now it is only function of what u1 now inside this. And so then this integration simply becomes minus infinity F of x1 of d1. So, we have now recovered basically if it is a jointly continuous, we have also said that for x1 also from that F x1, x1 jointly PDF, now we have recovered a marginal PDF. So, now, okay, now what, okay, suppose x1, x2 are jointly continuous, by that I know that by this definition there exists such a function F of x1, x2 which satisfies this relation, right. And now from that what I have derived for F of x1, x, I have this function. What is this implies that this implies that my random variable x1 is continuous random variable, right. So, jointly continuous we are now requiring the our earlier definition of continuous random variable for a single random variable. And in this case, this F of x1, this is going to be called as marginal distribution of x1. So, this marginal PDF of x1 and this one was what, this was joint PDF and this is like a marginal PDF. And so similarly, so this is now F of x1, so similarly you can recover F of x2 also. So, both you can recover from F of x1, x2 and these are called marginal PDFs and this one is called joint PDF. So, this is for what, I mean say we wanted to cover what is the meaning of this. The meaning of this is x1 is less than or probability that x1 is less than or equals to x2 and probability that x2 is less than or equals to infinity, right. So, we wanted to let x2 span all the values, x2 if you make x2 less than or equals to minus infinity, nothing is left, right. That is going to be 0. Now, let us turn our focus on the discrete random variables. So, we already discussed that for the discrete random variables, our interest is in what? Probability mass functions, right, because it is a discrete random variable, it is going to take only maybe at most countably finite number of points and if I know the probability at each of this point, that is enough. So, probability mass function is enough. So, maybe in this case also, if some random variable is jointly discrete, then maybe I should have a similar probability mass function definition here. So, we are going to say that if x1, x2, each discrete random variable with PMF xm which is defined as P of x1, x2, let us say x1, x2, xm, this is nothing but the probability that x1 equals to x1, x2 equals to x2, xm equals to xm. So, you are asking basically the question if x1, x2, xm are all discrete random variable, you want to know what is the probability that x1 takes exactly value x1, what is and x2 takes exactly value m and xm takes exactly value small xm, then this is going to be the, if we have such function, this is going to define the joint probability mass function of my set of discrete random variables x1, x2 of the xm. And for this case also, the way we recovered my marginal PDFs here, we can also recover your marginal PMFs. I will just do for the simple case of two random variables, suppose this x2 and x1 are discrete and jointly distributed, then I am going to get P of x1 of u1 is simply going to be, just sum it over the second random variable. So, I have and this is whatever x2, whatever its range is, whatever the possible outcomes of x2 and that is what I am submitting or all possible of value of u2. If I do this, I have already covered all possibilities of x2, then what remains is only this part. So, and that is what the probability of u1 for random variable x1. So, it is clear what I mean by joint PDF and joint probability mass functions and is that clear what I mean by marginal PDF and marginal probability mass functions. So, when I say joint, you have m random variables, that joint is going to say how it is kind of characterizing that all of the behavior together. And when I say marginal, I have recovered the distribution of the individual random variable in that from that joint distributions. So, let us say, let us give an example for joint, let us say PMF. So, you might have already looked at all this earn kind of examples. Let us say you have in one earn there are 10 balls, in another earn there are only let us say 5 balls but of different colors. Now, you want to understand, you are going to pick some balls from this and you are going to pick some balls from this and now you want to come out with the joint distribution. What is the probability that? So, you are going to pick some numbers from this, some numbers from this. You know that if the first ball has only, first earn has only 10 balls and the second one has only 15. So, x2 value can take what values? All the way up to 1 to 5 and x1 take all the values all the way from 1 to 10. And I do not know, maybe like in some cases it may happen that when you have picked 5 from the first earn, you will be allowed to pick only 2 from the second earn. Suppose, let us say there is such a condition, then it is forcing you a joint distribution there, right? Like some and when you can pick let us say 3 from the first earn, you will be allowed to pick only 1 from the second earn. This has kind of enforced like a joint thing, right? So, there it may be useful to come up with, useful to use this joint distributions. And from there, you can further, once you have a joint distribution, we have already said how to recover my marginals, just integrate over all the possibilities. So, joint distributions basically come into role when there is a constraint that the first occurrence puts on the second or like the second occurrence puts some constraint on the second, on the first. So, for example, if x1 is about the temperature and x2 is about some other properties like it is very cloudy and it is very windy or such things. I mean you can or correspondingly map this outcomes to some numbers so that you have an associated real random variable. So, the first random variable is temperature, the second random variable is the values associated with like cloudy, windy all these things. So, you naturally expect that if the second observation is cloudy to cloudy, you may not expect the temperature to be more than like 30, right or something of this sort. And also the second random variable could also take one of the output could be like rainy. If you see that x2 is already rainy, maybe the first temperature could be maybe just about 25, 26 degrees. So, one event already constraints the possibility that the other event, the other out, other experiments or other the random outcome of the second experiment. So, in such case, we have to come maybe like in such case, the joint distribution is what completely determines, completely characterize how this, how the outcomes look together when I am interested in both the values not about individual values. Now, we will define what we mean by formally independence, okay. So, that is our next, okay. So, what do you mean by independence of random variables? So, we know what we mean by independence of events, what we mean by independence was two events. Now, if you have two events A and B, the probability of the joint happening together is nothing but the product of each of them, okay. But random variable defines so many possible events, it is not just one event, right. So, we have to appropriately define what we mean by independence for the random variables. So, we are going to say that let us say we have m random variables to be independent, you should take any set of events, m events. So, if you have any sets A1, A2, all the way up to AM and if you ask this, like if you consider these events, okay, x1 belonging to A1, x2 belonging to A2, all the way up to xm belonging to A1, if these events are independent, I know what is independence of events, right, these are independent, if these events turn out to be independent, then I am going to say that my random variable x1, x2, all the way up to xm are independent. What is the crucial thing is, here the crucial thing is for any subsets, it is not that when I said, okay, so if it is rainy, it is going to constraint my temperature to be within some range, I just do not want, that is one event possibility. What I am asking for independence definition here is for every possible events, if that set of events happen to be independent, then I am going to call this x1, x2, all the way up to xm to be independent random variable, is that clear? All possible subsets of any subsets, m subsets, you just give me m subsets, like sets like this and now I am going to construct these events, xn belong to A1 and all the way up to xm and now I have m set of events. I know how to verify these m sets of events are independent or not, right, so how many conditions I need to check to verify this, we have already discussed, right, so if that happens, then I am going to call this x1, x2 all the way up to xm to be independent, so independence is not for a specific event, this is for all possible, so if I am going to look for different, different am, maybe I will end up with different, different events, right, all possible events that are arising through the sets am, all the way, this A1, A2, am, if they are independent, then I am going to call this random variables independent, okay, fine, so as you see that if you want to do this, this is going to be a horrendous task, never ending task, right, so what is the consequence of this definition? It so happens that here is the result, maybe you can take it as a theorem, okay, so let us, at this point, let us ponder on some of the technicalities involved in these definitions. So as of now, we are going to treat, we are going to take any subset, right, I mean A1, A2, any subset means it could be a closed subset or a open subset, whatever subset it is, but there is, but any set you are going to take need not be measurable, so there are some that theory that establishes this, any set I am going, any subset of R need not be measurable, but one of the things from the very definition of X1, Xm, I know that if X1 is the random variable, it has to be measurable, right, so that this question that X1 belongs to A1 is a valid question, if the sets are not measurable, maybe I will not be able to assign probabilities to this, there will be some issue, but that is going to take us to some major theoretic things and we have to define these sets to be appropriately and there is some notion of Borel subsets that I need to look worry about. So when I have a Borel subset, I need to not worry about whether I can measure it and apply probability on that, but for our class and for our purposes, we will just take this, any subset is fine, but if you make it, want to make it very, very rigorous, we have to actually be calling it as Borel subsets, which has certain meaning, but that meaning is subtle, like we do not need to diverge into that, but for us, for all our practical purposes, any subset should be fine, and that is just an aside, like I do not want to delve into that aspect, which is, I mean, that is not essential for this class, okay. So suppose I have set of random variables X1, X2, all the way up to M, then we are going to say that are independent if and only if the joint CDF factors. So what I mean if I have X1, X2, all the way up to XM, X1, this is good. So if you are going to take this joint CDF, then f of X1 is the marginal CDF, f of X2 is the marginal CDF for random variable X2, we are saying that this joint CDF splits as the product of individual CDFs. So instead of checking for all this, if you can check this condition, then it means that your set of random variables are independent. I am not going to prove this, but I think this is not hard to show this. If at least one direction, like okay, if this holds, you can write away, write this expression for this and show that this quantity holds, that it splits. And similarly, if this holds, you can show that the events are going to be independent for any subset of any set collection of sets, a1, a2, all the way up to a5. So if I say and notice that this is only if and if and only condition, right? So if I say my random variable X1, X2 all the way to Xm are independent, then you just need to check that their joint distribution splits. And now suppose I say my X1 is through of one dies and X2, the through of another dies, they are independent. And then if I ask the question, what is the probability, what is the joint probability that X1 equals to 5 and X2 equals to 6? Why we are going to compute? Multiply Y, that multiplication is coming because of independence. So if you are going to, you are just going to apply this for that particular case, okay? And because this is independent, I just ask first value is X1 equals to 5 and X2 equals to 6, right? Suppose if I ask the question, X1 equals to 3 and X2 equals to 2, what would I have done? I can either have multiplied, right? Even though I change the values, this is because independence is not for a particular event, this should happen for all possible events. So that is why when I said, when I say independence, you just do not worry which event is that you tell any possible event, you are just going to look at the probability of individual events, multiply and sell, this is the joint probability of these things. Now, we have to, so we have, we have distinguished what is independence and pairwise independence, right? So we have clearly said the set of events has to be independent. If they are a pairwise independent, no, I do not know. This is not, this does not implies my random variables are independent. They have to be independent, not just pairwise independent. If your joint 3df is going to split into each of the marginals one, then because of the if and only if case, it is true that if you take any set A1, A2, AM, then the events that X1 belongs to A1, all the way up to XM belongs to AM, they need to be independent. They will be. It is not like they are, that means they will be pairwise independent anyway and that independence means it is much more than that, right? Let us take the case of two ones. If it is like this and it so happens that this is X1 and f of X2, for all X1, X2, when I say this splits, that means all possible values of X1 and X2. If it so happens, then it by our definition, sorry, from the previous result X1 and X2 are already independent, okay? Right? Independent. No, we are just saying say like even you are going to take M random variables, whatever. If this happens, what is our definition? So, what is the amplification of our definition? We are just saying X1, X2, XM are independent. That is the only inference we are saying. That naturally happens like because we are saying this implies if this happens, we already know X1, X3 are independent, right? This already implies that if you are going to take X1 belongs to A1 all the way to XM belongs to A1 for any A1, A2 all the way up to AM which are a subset of R, they are independent, provided this holds. So, what is bothering here? If you satisfy that for all possible X1 and X2, we are saying that this is also guaranteed for all subsets of R. So, X1 independent means these are like, these subs are independent, right? That is the definition. So, if this satisfies, this will automatically satisfy and this happens for all A1, A2, AM, we are also saying this automatically holds. So, for the conditional probabilities, we are going to say that F of X given Y. Now, what we are interested is, suppose we have multiple outcomes, we want to ask that, okay, given that this has happened, what is the other events, what are the possible outcomes of the other experiments, right? We have already defined conditional probabilities. Now, we want to talk about conditional distributions, extensions of that. So, we are going to say that this is going to be F of XY, XY divided by F of X of Y, provided naturally we want this to be such that F of Y should be greater than 0, otherwise this is not useful. And last thing, so this is for the continuous random variable, for the probability mass functions when we have only two values, we already know, right, how to define conditional probabilities. So, I am only defining it for the continuous random variable. So, one last thing now, how to define then what we mean by conditional distributions. So, one event has happened, something is told. Now, I want to see what is the expected outcome from the second event. Let us say I have two outcome, two experiments, one denoted X1, another is going to be denoted X2. If I see the first outcome happened to be something, what is the expected outcome from the second? And this is what the meaning of this. And here, when I write it like this, what it means is you have told me what is the value of Y and then conditioned on this, what is the expected value of X. And I have already defined given Y, what is the PDF, right? So, now you just go and apply the same definition of your expectation on this. Suppose, let us say I have like this, how you are going to define the CD of expected value of this, this is going to be minus infinity to infinity f of X given Y of X given Y to dx. This value I already know from this into X, right? I have one more thing. Now, this often will be written as, we just write it as expectation of X given Y. If I just write this, it means that for given value of Y, this is going to be, once you specify me what is going to be Y, this is going to be the expected value. If I do this, there is no randomness in this. But suppose if I just write it X given Y, I have not specified you what is the value I am conditioning on, right? Why is the random variable? It could be taking many different possibilities. Here I have specified what is the value it is going to take and then computed the expectation. But if I don't specify this, this means for different, different values of Y, this is going to take a different, different values, right? So, in that case, this itself is a random variable because it depends on what is the value of Y that is going to take. So, this is going to be random. Now, another thing you can see that if you want to compute expectation of Y, what you could do is expectation of X in Y equals to Y and probability Y equals to Y and this is all possible values of Y. So, you just let go and convince yourself how I am getting this. So, this is like from conditional expectations, you are getting the expectation of X, okay? So, fine I have some, yeah this is into you are multiplying this probability with respect to this expectation. So, just to give an example for this, suppose let us say there are two things. One is you want to take, let us say you are going to take five exams and you operate, you don't know which exams you are going to be given, right? So, X1, so Y, let us say Y denotes your, which is that exam that you will be given and X denotes given that exam what is the your score, right? If I am going to tell you so different, different exams, you may be scoring different, different values. So, but before I give an exam, maybe like I am going to choose that exam with some probability, right? So, first you are going to choose that exam with some probability and after choosing that exam, this is the expected value from that exam we are going to get. But now if I will tell you before, I will not tell you what exam I am going to give you, just tell me the expected value you are going to score or if I have to take any exam. So, then what you will do is you will basically average your score from all the possible exams you could take, right? So, let us say there are five exam, first one you are going to take with 0.1, second with 0.2, 0.3 like that. If you are going to take the first exam that would have taken the probability 0.1 and had you taken that, what is your score? And like that you do it over all exams and then this is your expected score in that case, okay fine. So, this is what I wanted to talk about. There are a bit more about this conditional probabilities but related to that we will give assignments and from that you will learn the things I have skipped in conditional probabilities, okay? So, let us stop here.