 Welcome back. So far we have discussed probability spaces. So, we said the sample space is the set of all possible outcomes of a random experiment. We said that, so we then created a sigma algebra of events on the sample space and we then we define probability measures on a measurable space. So, it turned out that omega is countable assigning probability measures is possible to assign probability measures to all subsets of omega. Whereas, when omega is uncountable such as 0 1 interval or infinite coin toss model. In those cases, it was an uncountable sample space. In that case, we had to settle for a smaller sigma algebra and assign probabilities. So far we have completed that one section that just talks about what probability spaces are and how this probability measures are assigned to a measurable space. Now, we begin a new section called. So, this is on conditional probability. So, I will start with definition. So, we have a probability space omega f p. So, everything, so now that we know probability spaces, everything we do will start with an omega f p. We have sample space a sigma algebra on the sample space and some probability measure p on omega f. So, this is a probability space. Let b be an event such that p b is greater than 0. In definition, the conditional probability of a given b is defined as. So, what you are fixing a particular event b that is. So, what do I mean by b is an event? I mean that b is an element of f, the sigma algebra f. I am given this probability space. So, I consider some b belonging to f such that the probability of b is strictly positive. So, it is a set whose probability measure is not 0. So, and we then we say. So, and a is another f measurable set, a is another event. We say the conditional probability of a given b. This is what we are trying to define. This is the first time we are seeing this. So, this is the definition. So, we define the conditional probability of a given b denoted like this. So, a vertical line b that is defined as probability of a intersection b divided by probability of b. This is by definition. There is no y. Definitions are definitions. That is it. They will just be taken. So, intuitively of course, you will see that we are simply scaling the. So, you are fixing an event b and you are looking at the probability that a and b happen and scaling it by probability of b. That is what is intuitively happening. So, you see why this condition is important. If not the denominator will be 0. So, you have to condition on events of non-zero probability. First of all you have to condition only on f measurable sets, not on arbitrary subsets of omega. You also have to condition only on events which have strictly positive probability. So, caution. So, we cannot condition on 0 probability events. So, what I am saying is the event to the right of this vertical line must be non-zero probability. See this a can be anything. A can be any f measurable set. But, what is on the right side of this vertical line must be a non-zero probability event. So, this is a fact that is when it is very easy to see why you need it. Otherwise, this is meaningless. The denominator being 0 is meaningless, but very often sometimes you can get into trouble with this. Sometimes you miss out on this. So, for example, if you are talking about let us say omega, let us say 0 1 interval f is borel and p is your uniform Lebesgue measure. You cannot for example, condition on say rationals right. Why? Because rationals have 0 probability measure under that uniform Lebesgue measure. So, you cannot ask a question like what is the probability that my number is between 0 and half given that it is a rational right. Understand what I am saying. So, if I am talking about A as the event that the number is between 0 and half and B is the event that the number is rational and I am talking about just the Lebesgue measure. So, that question is meaningless. You should never ask such a question and there is no answer to that question ok. You can only condition on you cannot condition on Cantor set for example, because you will you if you have not already shown it. You will show in your homework that Cantor set has 0 measure right. You should not say what is the probability that my number is between 0 and half given that it is a Cantor number no right. There is no such question is meaningless you cannot answer it. Actually, there is actually a paradox which I think is called Borrell's paradox or Borrell Kolmogorov paradox or something. So, which takes a sphere and asks about the conditional distribution of the angle the polar angle or whatever given that you are on one of the diameters with uniform measure on the sphere right. And you get multiple answers just like we saw an earlier paradox Bertrand's paradox right. So, this is another paradox which gives you multiple answers and why is that because you are conditioning on a 0 measure set right. So, you can look that up if you want. So, you can get into all sorts of inconsistent and paradoxical situations. You should know if you are not careful about this theorem let B in F then P of dot given B. So, this P of dot given B which is a mapping from F to 0 1 is a probability measure on omega. So, what are we saying this theorem says if you fix some event with positive probability then you take. So, you take you fix this B and you look at probability of dot given B by me by dot I mean you will take arguments from F right. And you and this is the definition right what you can show is that this guy is a valid probability measure on omega F. So, you have to prove this right this is a theorem. So, you have to prove this right how do you prove it you have to prove how many things two things three things actually right for a probability measure. So, you have to prove that if you input if you input empty set here you should get 0 if you input sample space here you should get 1. And finally, countable additivity of disjoint events right. So, the first two are trivial right if you are a were to be null null intersection B will be null right 0 by something which is positive is 0. Similarly, if a is the whole sample space then omega intersection B will be B. So, you will get 1 right finally, to verify countable additivity. So, if you want. So, the first two are easy right 1 and 2 properties are easy right to show countable additivity consider a I belongs to F I equal to 1 comma 2 comma dot dot dot which are disjoint right I have to consider a countable collection of disjoint events and show that this set function has countable additivity property right. So, you have to. So, you have something like that right this I have to show is equal sum over I equal to 1 to infinity p of a given b right that is what I have to show. So, this is equal to by definition. So, I invoke this is equal to p of b and here I have p of union I equals 1 through infinity a I intersected with b right. Now, there is an identity as a theoretic identity right. So, if you have something like a union b intersection c right it is a intersection c union b intersection c. So, by except this is a countable infinite union. So, you can use that set theoretic identity to get probability of. So, this is this union intersection b. So, now I will have union of a I intersection b. Now, a I is a disjoint therefore, a intersection b let us if you want to call this c I those c I will also be disjoint. So, my p is a probability measure on my original space right. So, p satisfies countable additivity correct. So, then so by countable additivity of p you have this is equal to sum over I equals 1 through infinity probability of a I intersection b divided by a I intersection b divided by probability of b and that is exactly what you want right. This is equal to sum over I equals 1 through infinity probability of a I given b fine anyone ok. Any questions very elementary proof I just did it. So, that I just took a minute so I did it right. Yes. So, union intersection b the union is a posterior point and posterior die. So, the I am taking a intersection b as getting a heads. Well, you have to see all of this happens see you cannot you are not operating in two different probability spaces right. There is only one probability space omega f p right. So, you are not talking about one experiment involving tossing of a coin another involving throwing of a die right. So, that is not the scenario you want to consideration yeah you are talking about a given experiment given. So, you have seen a random experiment with sample space the sigma algebra the probability measure everything is fixed. Now, you are conditioning on a particular event b with positive probability and I am saying that whatever I define here is a valid probability measure for any b with positive probability that is all I have said. So, if you want to consider for example the throw of a die f you can take it as omega is 1 2 1 2 3 4 5 6 f is 2 power omega because with a discrete sample space p is let us say the uniform 1 by 6 on each of them. You can take as an example you can take b to be the event that an even number shows up. So, you can look for the probability that given that it is an even number what is the probability that it is 4 something like that right you can calculate it right. So, I think we will go and do some properties of conditional probability. So, we did proper properties of probability measure I probably listed some 7 or 8 properties right. So, see this is a valid probability measure on omega f right. So, any property satisfied by probability measures is satisfied by this guy also probability of a dot given b satisfies all those properties because it is a valid probability measure right. So, we will put down a few which are more specific I have 3 actually 1 let b i let us say i equals 1 2 dot dot dot b a partition of omega. So, what does that mean? So, it just means that. So, i e it just means union b i equal to omega and b i intersection b j equal to null for i is not equal to j right and let a be some event. So, these are also f measureable. Suppose p of b i is greater than 0 for all i then probability of a is equal to sum over i equals 1 through infinity probability of a given b i times probability of b i. So, this is the law. So, this law has a name you know what name it is total probability the law of total probability right. So, we are looking at a partition of the sample space by f measureable sets. So, b i is a f measureable and they have a form of partition of omega and a is any f measureable set. Then you can write the probability of a as the probability of a condition on b times probability of b sum over all these partitions. So, in particular there is a very particular version of this which is quite useful. So, if b is any event with positive probability b and b complement always partition the sample space right very trivial to show. So, therefore, you can write. So, if in particular if b is in f such that probability of b is greater than 0, but less than 1. So, I am saying 0 less than p b less than 1 then probability of a is of course, your any f measureable set can be written as probability of a given b times probability of b plus probability of b plus a given b complement times probability of b complement. This is a corollary of this total probability law, because b and b complement partition the sample space and I have. So, I can condition on this, because b has positive probability and I can condition on b complement also, because b does not have probability equal to 1 it has strictly less than 1. So, probability of b complement is strictly positive. So, here I need both. So, how do you prove the law of total probability this is. So, this is equal to probability of a intersection b i right. So, the RHS. So, if you want to prove this here the RHS proof see if you consider this probability expression this probability of a given b i is probability of a intersection b i over probability of b i. So, it will cancel this out. So, you will have RHS is equal to sum over i equals 1 through infinity probability a i sorry a intersection b i. Now, b i's are disjoint right. So, a intersection b i's are also disjoint. So, by countable additivity of p I have this I can write as probability of union a i sorry a intersection b i agreed. Now, again I use this rule right this is a theoretic identity which says this will be equal to probability of a intersection union b i, but union b i is equal to well. So, this is equal to this is omega correct this guy is omega. So, I have a intersection omega which is a that is all sorry probability of this is a intersection omega this is probability of a correct which is what I want any questions. So, this has. So, typical example of the law of total probability is when you have this balls and urns experiments right. So, let us say you have let us say I have n urns and each of them has let us say urn i has n i red balls and m i blue balls or something and you are looking at. So, you pick an urn at random and then you pick a ball at random with let us say uniform probability then you can talk about. So, you know that the probability of choosing the ith urn is whatever 1 over n in that case and probability that you choose a red ball given that you chose a particular urn i is n i over n i plus m i. So, you can look you can calculate the unconditional probability of picking a red ball right. So, that is the kind of law this is saying right. So, you first condition on what urn you are picking and then you look at the probability of picking a red ball then you can compute the total probability of picking a red ball in this experiment. So, that is where this is useful and it does not have to be it can be a countable it does not even have to be a finite set of urns it can have a countable in finite set of urns right. Is this property any question property number 2 a belong to f with probability of a greater than 0 and b i i equals 1 2 dot dot dot b as in 1. So, by b i as in 1 I mean that b i is a partition of the sample space with each having positive probability then probability of b i given a is equal to probability of a given b i times probability of b i divided by sum over j probability of a given b j times probability of b j. So, notice that now I am computing I am not computing probability of. So, I am not starting with probability of a given b i am computing b i given a right. So, on the right side of a vertical line I have a now. So, I should ensure that a has positive probability, but all over here I have the b i the b j are on the right of the vertical line. So, they should all have positive probability. So, then I have this relationship this is a property what is this called Bayes rule. So, it is a very trivial actually it is a very trivial identity if you this what is in the denominator is p of a right by the total probability. So, you bring this here it becomes what it will become probability of b i intersection a and this is also probability of b i intersection a. So, you are just writing p of a intersection b in two different ways right. So, there is a really very little to Bayes rule it is not some very straightforward application of the definition of this conditional probability. It looks big because you are applying total probability in the denominator. So, you bring this here it will become probability of b i intersection a which is also the numerator right. So, in the earn and ball example. So, you are looking at the probability of. So, given that you got a red ball what is the probability it came from earn number i right. So, given you are picking from earn number i you know the probability of a red ball, but. So, post factor you are telling oh I got a red ball meaning a is the even that I got a red ball oh I got a red ball, but what is the probability that I got it from earn number 27 right. So, it this is called because you are asking this post factor question this is called a a posteriori probability a posteriori just means that after the fact as a posteriori which is means prior right. So, this you can compute lots of examples you can compute this for. So, but mathematically it is a very simple application of conditional probability definition and finally, I will put down one property property number three let a i in f i equals 1 2 dot or dot then probability of intersection i equals 1 through infinity a i is equal to probability of a 1 product a. So, intersection j equals 1 through i minus 1 intersection blah blah blah j as long as conditional probabilities defined. So, I am considering the probability of a countable intersection now. So, this is the probability of the countable intersection of a bunch of events I am writing it in terms of a conditional probability product of a bunch of conditional probabilities. So, why have I said as long as the conditional probabilities are defined I have to have see each thing that I am conditioning on here must have positive probability right. So, this is the probability of the countable intersection of a bunch of events I am writing it in terms of a conditional probability product of a bunch of conditional probabilities. So, why have I said as long as the conditional probabilities are defined I have to have see each thing that I am conditioning on here must have positive probability right. How do you prove this induction? So, when can you use induction when you are proving a statement about the natural numbers right induction is always used to prove a statement p n about natural numbers right. Is this a statement about natural numbers no. So, you cannot use induction. So, you have to use continuity of probabilities. So, what you can do is LHS if LHS is equal to by continuity of probability I can write this as limit n tending to infinity probability intersection i equals 1 through n a i correct continuity of probabilities. Yes and now I can use. So, now this is a finite intersection right. So, I can write this as I can keep the limit and write this as probability of a 1 times product i equals 2 to n all that right. And then if you and then you are sending n to infinity in the product right. So, that will become this you will get. So, you are just taking p i p. So, for n equal to 2 you will have probability of a 1 times probability of a 2 given a 1. So, if you had 3 for example, it will have a 2 given a 1 probability of a 3 given a 1 intersection a 2 right. That is by just by definition right. So, you just figure this out. But the key step I want to emphasize is the continuity of probabilities that is the key step right. Everything else is just definition of conditional probability. So, this just says that if you know a bunch of finite conditional probabilities. So, this is all a given a finite intersection right. Then you can compute the probability of this countably infinite intersection ok. Any questions? If there are no more questions on conditioning I will go to the next topic ok. It is a very short topic this conditioning. The next topic is a very important topic independence. So, we are talking about independence of events. So, as usual you have given a probability space ok. So, this independence of events is defined as follows ok. First we will define independence of two events. And then we will go on to define independence of arbitrary collection of events ok. Events a and b are said to be independent under p or simply independent when the measure p is unambiguous. Measure p is if probability of a intersection b is equal to probability of a times probability of p. So, you are given two events right a and f are a and b are both elements of f ok. You say that a and b are independent under measure p under the probability measure p if probability of a intersection b is equal to probability of a times probability of b ok. So, the reason I have said independent under p is because if you have some other measure on it some p prime is another measure on omega f the same events a and b may not be independent right. So, two events a and b may be independent under some measure p on this probability space. If you put some other measure on it they may not be independent depending on what that measure is right. So, strictly speaking you should say a and b are independent under that particular probability measure right. So, what I am saying is whether or not two events are independent depends on the probability measure. But what so here I am clarifying that we will only say they are independent as long as it is very clear what probability measure we are talking about. So, every time I will not be saying oh it is the independent under p right. So, if it is very clear what probability measure I am talking about if there is only one measure or it is completely clear then I will not keep saying under this particular measure I will just say independent that is it a and b are independent. So, it is actually a very simple definition if the probability of the intersection of two events products out into the probability of a times probability of b then a and b are said to be independent events. Now, this is the definition and this is the only definition. So, do not take any other notion you may have of independence from now on. So, you may have some intuitive meta understanding of independence of events right. You may say oh if the probable occurrence of a has nothing to do with the probable occurrence of b or something like that right all that is misleading do not take any of that. So, you should always say. So, you should always just keep this definition probability of a intersection b is equal to probability of a and probability of b then a and b are independent. So, this is the definition. So, if probability of b is greater than 0 note that if a and b are independent then probability of a given b is equal to simply probability of a if probability of b is bigger than 0 then probability of a given b is simply equal to probability of a if a and b are independent. So, why cannot we take this is the definition of independence because this is not even defined if probability of b is 0 right. So, this is the thing to take right and if probability of b is greater than 0 then this is true, but this is a consequence of the definition this is not the definition ok. So, a definition always as if and only if you can think of it that way. So, definition there is no need to say if and only if only in theorems you say if and only if right to prove two directions. See you do not know what independence means right you have not even heard of it so far let us say I am telling you what it is definition is what something that tells you right you say that two events are independent if the probability of the intersection is equal to the product of the unconditioned probabilities. So, the reason I the reason that I cautioned you about taking this definition and only this definition is because it many students have some intuitive understanding of independence which can sometimes put get you into trouble right. So, you can say over the occurrence of let us say a and b are independence if people often say if the occurrence of a does not reveal anything about the occurrence of b right things like that it is not true all that is not correct I want to make that perfectly clear all those same kind of intuitions hold if only both events have positive probability that that intuition that occurrence of a says nothing about the occurrence of b whatever that means that intuition is that is actually a wrong statement because for example let me give you an example let us say that you are working with 0 1 Borel and Lebesgue our favorite measure right 0 1 Borel Lebesgue let me say a is the event of a rational let me say b is the event of a irrational are they independent a is the event that it is rational b is the event that it is irrational. So, probably of a intersection b is a intersection b is null right there is no number which is both rational and irrational. So, this will be 0 and what about this will also be 0 because probability of rational is 0 right. So, if I ask you oh the I throw a dark on the 0 1 interval under the uniform Lebesgue measure with the under the Lebesgue measure is the event that it is a rational independent of the event that it is an irrational the answer should be answer should be what is are the two events independent yes right because that is what the definition says, but if you go by oh, but if I know that if I know that the dark lands on a rational number I know that it is not an irrational number right. So, the occurrence of a straight away rules out the occurrence of b. So, it says everything about b right once a is known b is known completely right, but still they are independent right which is why I am saying that you should not confuse these notions of occurrence of a it says nothing about occurrence of b all that is misleading it can get you into trouble. This is a serious matter right people give these kind of meta definitions very often right I want to urge you not to do that anymore this is the definition. So, no so that intuition is by and large if both events have positive probability. So, let me ask you a question can an event a b independent of itself I am taking this argument to an extreme right. So, occurrence of a says everything about the occurrence of a right can an event a b independent of itself. If it is if it only with concept sample space see you are not applying the definition simply apply the definition right. So, I am saying that a is independent of a then I have to say probability of a intersection a equal to probability of a times probability of a which means p of a is equal to p of a squared which means p of a should be 0 or p of a should be 1 right. Can you say anything more you cannot say that a should be null or a should be sample space not true not true because a can be canter set of something under the Lebesgue measure right or a can be irrational probability 1 right. So, the even under the Lebesgue measure the probability of getting an irrational is independent of itself right is true there is nothing fishy going on it is actually true right. So, that is I want to caution you against these kind of fork definitions they are not correct this is the correct definition.