 Okay, now based on this joint distribution, we can recover the marginals. So far we are talking about 1, 1 random variables, right? So if I am going to give you joint behavior of 2 random variable, you should be able to recover behavior of individual random variables. So let us say I have 2 random variable and their joint probability distribution is given to me, CDF is given to me. Now I am only interested in the 1 random variable x1 not about x2, right? Now how to do its behavior? Now one possibility to do this, you let x2 go to infinity in this function and what does x2 letting go to infinity means you are basically letting this go to infinity. That means you are allowing x2 to be less than or equals to infinity. That means you are allowing all possible values of x2. Then any effect has to come from only x1, right? And that is what now we are able to recover the effect of random variable x1 and that is why we are just calling it a f of x1. Similarly if your focus is only on x2, you similarly let x1 go to infinity in this function. You let x1 take all possible value, whatever the effect remains it has to come from only x2. And in such case this f of x1, f of x2 are called marginal CDFs, ok? Now which one will have, if I am talking about 2 random variables, which one will have more information? Case 1, I will provide you joint CDF, case 2, I will provide you individual CDF. Which one will have more information? Joint, right? Because if I provide you joint, you can always recover the densities, individual marginals. But it is not always the case that if I provide individual behavior, you may not be able to get the joint behavior, ok? So joint, you can get the marginals, but from the marginal you cannot get the joint behavior, ok? That is why providing a joint information is always going to be difficult like, I mean you need to have good amount of information. And this is something simple, I will just skip this part. I mean whenever you are going to deal with discrete random variables, you will just joint probabilities is just asking x1 is taking x1 and x2 is taking x1. And obviously if I have this joint probability, if I add over x1 and x2, this joint probability it has to add up to 1. Earlier I have this right like x, if I add over all possible values of x, it should 1. Now it is a joint probability like x1 and x2. Now x1 and I have to add over all possible values of x1 and x2 and then it should add up to 1. Or this is same as a in double summation x1 and x2 and then p of x1 and x2, this should be equals to 1. You understand this, ok? Now let us quickly do this exercise. Let us say I have two random variables x1 and x2, x1 takes 3 values 1, 2, 3 and x2 takes value 2, 4, 5 and I have given their joint probability mass function in this table, ok. So if I add it over all possible values in this matrix, it should add up to 1. And only it is a valid joint probability mass function. Now from this, let us try to recover the marginals. First let us focus on random variable x1 and this is random variable x2. What is this probability? This is asking what is the probability that my random variable x1 takes value 1. I am interested in this value and in that case I do not care about what is the possible values of x2, right? So I will just take all the values and you will get what is the next one and what is the last one, 0.3. And like that if you want to compute this, you have to sum up the elements in each of the columns. Ok, now this is for the discrete case. What about the continuous case? Continuous we were going to extend the definition analogously. And jointly we are going to say that a given bunch of random variables are jointly continuous. If there exist a function, multivariable function f such that this relation holds. That is at CDF can be given as integration of this function over all possible random variables. So there should be minus minus here. And you see that this is a simple straightforward extension of what we have for a single random variable. That was like in the simple case we had this is like minus infinity to x f of, let us call this f of i divided. And now we are just like making this x to be x1 up to xn and now putting integration which each one of them. And whenever such an f exist, we are going to call it as joint PDF. And examples again pretty obvious, I will just look into one example. Let us take a healthcare example. Suppose certain health condition depends on two values. One is your blood sugar level and another is your body mass index. A doctor is there, if you go to him, he is going to ask you to perform two tests. He will ask you to get and test your blood sugar level. Whatever the outcome, let us call that a random variable x1. And whatever your body mass index, let us call that x2. Maybe they are dependent and one has influence over other. Like if your body mass index is high, maybe you will also have sugar, who knows. So then there is some dependency there. And we have to look into them together. Doctor, he has asked, doctor has asked you both the repeats because he want to make a decision looking them together, not just like a blood sugar level or not just like your BMI. He want to look them together and then make a decision. Because they together influence something and that is where one has to deal with this joint PDF. Like the way we did it for discrete random variables, there is a notion of marginal PDF also and that we can get first basic property is our joint PDF should integrate to one over all the variables over which we are interested in. If f of x1, x2, this x1 and x2 variables, it should integrate to one. And now if you are only interested in random variable x1 and do not care about the influence of x2, what you need to do is you need to let, maybe this should be t2 here, what you need to do is you let integrate this over all possible value of x2, then what remains is only the influence of x1 and you will get the effect of only random variable x1. And similarly, you can integrate this function for dx1 and then you will get only the influence of x2. And this f of x1 and f of x2 they are called marginal PDFs of random variable x1 and x2. So again from joint PDF, we are able to recover the marginal PDFs. I will leave this example you can work out like there is a one joint PDF we have given. So there is a some constant c, can this c can take any value here? No. How we are going to find the value of c? You will integrate this function between 2 to 3 on x1 and between 1 and 2 on random variable x2 and equate it to 1, you will get one equation where in terms of c and then you solve that equation it will give you the value of c. And that will completely define your f of x1 x2 even though I did not specify c here, c cannot be arbitrary here. C has to be some particular value for which only this is a valid joint PDF. From that joint PDF you can go back and recover your marginal PDFs. Now let me quickly look into this one aspect called independence of random variables. Now the way we introduce independence of events we are going to define a bunch of random variables are independent if their joint CDF can be expressed as product of their marginals. Once you have this joint CDF, I can get this marginal, this marginal, this marginal and this marginal. If this can be written as a product of this marginals then it is called actually independent. These random variables are called independent. So notice that here in whenever it is independent it is just enough to provide CDF of each of the random variables and from that I already have their joint CDF. Earlier I said that if I give you marginal CDF I may not able to recover the joint CDF. But however if I say additionally they are independent then providing marginals is enough you already readily get your joint CDF or joint CDF here ok. So this simply translate to in the discrete case if I have two random variables X1 and X2 if they are independent then their probabilities can be expressed as a product of their marginals marginal probabilities. And similarly in the continuous case if I have their joint PDF it could be expressed as product of their marginals if they are independent ok if they are not independent this property may not hold. So if random variables are independent it is enough to specify their probability mass functions or probability density functions. We do not need joint PDF or PMF because that can be recovered from their marginals ok. Let just me talk this and then we will take a break ok. Let us say there are n coins each one of them is a Bernoulli. I hope all of you recall what is a Bernoulli random variable. Bernoulli we said comes with a parameter p and I am denoting the ith random variable with parameter p i here. And now I am saying this X i's are independent ok suppose now I want to compute this probability what is the probability that X1 is first random variable is going to take value X1 second random variable is going to take value X2 like that. This because of their independence I can write like this right because once they are independent this is them. Now as a special case let us say all these p i's are some same value and what I am going to do is I am going to add all these random variables X i's. Now if all these X i's are Bernoulli what is the possible values of Bernoulli random varying variable can take 0 and 1 then what is Y can take 0 to n. So this is going to take 0 to n ok I mean 0 up to n. Now you can verify this. This Y is nothing but a Bernoulli random variable with parameters n and p ok. Now if you go back and compute all the probabilities what is the probability that Y equals to 0 what is the probability that Y equals to 1 what is the probability that Y equals to n you will exactly get what is we have defined for Bernoulli sorry binomial random variable. So what is the relation then between binomial and Bernoulli random variables is just some of Bernoulli random variables that is constant probability of success and they have to be independent. So binomial random variable with parameter n and comma p is nothing but it is the sum of n independent Bernoulli random variables with the same parameter p ok. So that is where the independence often simplifies the things and you will see connection between different distributions. A consequence of this independence is if you want to take product of this random variables like let us say I would define a new random variable let us call Z and I am going to take product of this. And now if I want to take the expectation of this Z I do not need to go and find out its P M F or P D F all together. I can directly compute this as expectation of X 1 all the way to expectation of X n that is I can just need to find out the expectation of individual random variables and if I take the product that will give me the product of this that will give me the value of expectation of Z. This is true only they are independent if they are not independent this simplification does not work ok. Now one last definition we have is if you have this bunch of random variables if all of them are independent they are independent and also each of them have the same distribution ok. For example in the case of binomial we said right there are n Bernoulli random variables each of them has the same parameter that means all of them have the same distribution like that if we have independent random variables and all of them have the same distributions then this bunch of random variables are called independent and identically distributed ok. So one quick example is suppose let us take I will take X 1 to be Bernoulli with parameter P 1 X 2 is going to be Bernoulli with parameter to P 2 like that X n is going to be Bernoulli parameter P n. I will tell you that they are independent but can I they can I say they are IID no why yeah they they need not be same distribution right even though they are Bernoulli but the parameters are different. If I say that independent and further say that all this P 1 equals to P 2 all the way up to P n are equal then they are IID ok. Let us stop here.