 Hi, I'm Zor. Welcome to Unizor Education. We continue talking about conditional probabilities. This lecture is part of the course of Advanced Mathematics for Teenagers presented on Unizor.com. And that's where I recommend you to watch this lecture from because it has notes on the side. And actually it would be very useful if you read the notes first, then listen to the lecture, then you might actually read the notes again just to clarify certain things. Because sometimes I'm talking slightly differently than I'm writing. And it might actually be very useful for you to have a complete picture from both sides, oral and written. Okay, conditional probability. The previous lecture was dedicated to just philosophical discussion of what conditional probability actually is. This lecture will be a little bit more mathematical and I will basically end up with a definition of the conditional probability. Okay, so I will use the same example as in the previous lecture when I was talking about conditional probability. We have two dice. We are rolling it. But in this particular case, let's just consider that we have two people. One is rolling the dice and then tells the results to another guy. And the condition is that the guy who is rolling completely ignores all different rolls when the sum of two dice is not equal to six. So he delivers the results to the second guy who is supposed to do something with these numbers, only in case the sum of two, sum of the numbers on two dice is equal to six. Why am I doing this? Well, just to basically impose a certain condition on the random experiment. From the position of the second person, he doesn't even know about what was ignored by the first person. So all the results of the experiment which are delivered to the second person are those results when the sum is equal to six. So initially, the second person, if he doesn't basically know about anything which is related to this, ignoring certain combinations, initially he was considering that his space is six by six square, where the roll represents the result of the first dice and the column represents the result of the second dice. So this means four, five, for instance. Now, so initially, this second person who is supposed to do something with these numbers would assign equal probabilities to each pair of numbers from one to six, and each one would have the probability of 136. Now, that's in case he doesn't know anything. I was talking that knowledge changes the probability. So what the knowledge now this particular person has, he knows that whenever he has the combination of two dice, the sum of two dice is always equal to six, which means that out of these 36 different combinations, really occurring are only one five, two four, three three, four two, and five one. So the numbers which this second person gets are only one of these. What does it mean for him from the probabilistic standpoint? It means that the probabilities are no longer evenly distributed among 36 different combinations. They are evenly distributed, but only among these, which means that everything else has zero probability. And these have one fifths to have the result, the total still equal to one, right? Some of the probabilities of all elementary events must be equal to one. So this knowledge that the sum of two numbers is equal to six actually forces the second person to consider that the probability is no longer as it was before, 136 on every square, but it's zero everywhere except these five, and on these five is equal to one fifths. So the knowledge results in the shifting of the probabilities as they are distributed among the elementary events. Now, what is the job of the person number two? Well, his task is to predict the result of the second dice out of two. Is it even or odd? That's all. I mean, no more precise prediction, just even or odd. So let's just say that his task is to find the probability of the even number, right? Because the odd would be one minus this probability. All right. So, again, initially, what would be the probability of having an even number on the second dice? Well, events which we are interested in in this particular case are all events when the second dice, which is represented by the column, is equal to an even number, which means it's these and these squares, right? So they represent these six, six and six, 18 different squares. So, again, if we don't know anything about this condition of the sum is equal to six, then the probability would be 18 times 136, which is 1836, right? That would be my initial probability of having an even number as a result of the second dice. But now I know that all these guys have zero probability and only the elements on this line, only these five, have a non-zero probability. Well, what does it mean? Well, it means that I have to basically do exactly the same as before. I have to add the probabilities of all the elementary events which comprise my event, which means all these, these and these. But these are all zeros except one, which is one-fifths. This is also all zeros except one, which is another one-fifths. And these are all zeros anyway. So I have only two-fifths as a probability in case I know that the condition of my random experiment is that the sum is equal to six of two numbers, right? So unconditional probability, if I just completely randomly predict what's the result of two dice, is this one. But as soon as I introduce this condition that the sum of two numbers is equal to six, then my probabilities are immediately reshuffled in some way. Well, obviously the sums should still be equal to one. So they are reshuffling in such a way that these are one-fifths and all others are zeros. And now the same event actually I'm interested in would have a different probability because the probability of event is comprised from the probabilities of elementary events. And elementary events have different probabilities now after my knowledge about this process. So my prediction would be that if I will say that there will be an even number, the probability of this event would be two-fifths, which means the more I make this experiment, the closer the statistics will be to fifths. So this is basically how the conditional probability works. Now I would like to relate it to initial probabilities of these particular events. So this is my conditional probability. But now let's notice this. Event which I am interested in, which is even second number is even. Let's call this event a. Now, this fact that the sum of two numbers is equal to 6 would be another event which has occurred. Sum of two numbers equal to 6. This is event b. So what I have just calculated is the following. I have calculated that the probability of the event a under condition that the b has occurred, that the numbers are summed up, is equal to two-fifths. That's what I have just calculated. But let's just think about different approach. In my original distribution of the probabilities, when each square has 1 over 36, what is the probability of the event a, just as if I don't know anything about the event b? Well, the probability of the a would be, as I said, 1836, which is one-half. Now, look at these two. This and this. These are the only significant elementary events. Why they are significant? Because on one hand, they satisfy the initial condition that the sum is equal to 6. On another hand, they belong to the event which I am interested in. So there is a conditional event, event which is a condition, and then event which I am interested in. The event which is a condition is basically the combination of these elementary events. Event I am interested in is combination of these elementary events. And what are these events? They are common. They are intersection speaking in the language of the set theory. So, basically this is a intersected with b. That's what it is. These two events. And what's the probability of this? Well, that's only two elementary events. And I am talking about the initial probability. So, it's 236. Now, finally, I am talking about event b. This is a conditional event, event which has occurred as a condition of my experiment. And what's its probability? Well, it's 536. So, what do I notice? If I divide the probability of the intersection, which is 236, by the probability of the event which is a condition 536, I will get exactly the same thing, right? If you divide 236 by 536, you will get exactly the same thing. So, is this a law that the conditional probability of a under condition b is the same as the probability of their intersection divided by the probability of the conditional event b? Well, actually, it is. And I am going to show it to you in a more general case. So, this is my example where I kind of hint that this particular dependency should exist. Now, I would like to, well, I will say it proof, but this is not exactly the universal proof. This is a proof only in a relatively simple case. Simple case is when I have certain events, certain elementary events, which have equal chances to occur, and my sample space contains n elementary events, which have equal chances. So, the probability, so, events are e1, e2, etc., en. And the probability of every event, event i is equal to 1n. So, you understand the symbolics. These are elementary events, which comprise my sample space. And the probability of each of these elementary events is n. Now, let's talk about one event I am interested in. This is event a. It has certain subset of these elementary events, right? So, let's consider that it has m elementary events and their numbers are, let's say, m1, m2, m and capital M. This is my subset of elementary events, which constitute certain random event I am interested in. Now, back to the problem which I had before, these are n is equal to 36, e1, e2, en are all the different pairs of numbers 1 through 6. So, it's 1, 1, 2, 2, 3, 4, 1, etc., all the different combinations of the numbers from 1 to 6 in pairs, right? 36 different combinations. And the probability of each one is 1 over 36. Now, these are 18 different elementary events, which have the second digit even. So, it's like 1, 2, 2, 2, 3, 2, 4, 2, 5, 2, 6, 2, 1, 4, 2, 4, 3, 4, etc., so, 18 of them. So, m is equal to 18 in my case. This is an event I'm interested in. Now, what about the event which is a condition of my experiment? The condition of the experiment is event when the sum of two numbers which are on the dice equals to 6. So, this is my conditional event b, which we assume as occurred, and it contains other, let's say, k elements. So, it's a k1, k2, etc., k, capital K. This is lowercase, this is capital K. Just for convenience. So, k of them. Now, finally, let's consider the intersection between these two. So, what are the elementary events which are included into both? Well, that's another subset. And let's say it's l1, l2, l, capital L, l of them. That's what we see. So, this is the description of my random experiment. Now, let's talk about conditional probability from the standpoint of redistribution of probabilities. So, before I had this distribution of probabilities for every event, now I'm saying that, you know what, event b always occurs. So, back to my original example, sum of two numbers is always 6. What does it mean? Well, it means that my probability total probability of 1 is distributed not among all n initial elementary events, but only among these k. And the weight of each one of them is 1 over k, 1 over k, 1 over k. So, that's my new distribution of probabilities. So, any event which is part of these has the measure, has the probability of 1 over k. Any event which is not part of this has the probability of 0. Now, let's just think. How can I calculate the probability of event a if I know that this is a real distribution of probabilities? Well, I have to summarize the probabilities of each of them. Now, each of them are either one of those, which are basically the intersection. It's these guys, or not one of those, in which case their probability is 0, and in the sum they can be completely ignored, right? So, only these elementary events, which are part of this and this, have the measure of 1k, 1 over k. So, only l out of m events have the measure of 1 over k. And the rest have the measure of 0 because they are not part of the b. So, this is a, this is b, this is a intersections b. So, I'm talking about adding up everything which is within a. It's these guys. And those guys which are not part of b have the measure of 0, so I ignore them. So, all they have to basically summarize are those which are inside of this shaded area. So, these are l elements. Each one has a measure of 1 over k, which means that my probability from this standpoint is supposed to be equal to probability of a under condition b is supposed to be l over k, right? But on the other hand, if you forget about redistribution of the probabilities and use the old measure, old probabilities, as if I just don't know about this redistribution, what happens? Well, this is obviously equals to probability of a intersection b, which is what? l over n, right? l elementary events. Each one has the measure of 1 n. So, this is l over n divided by probability of b, which is k elementary events. So, this probability is k over n. So, it's l over n divided by k over n. So, n is reduced and you get this. So, the formula actually is true in relatively abstract case of the sample space, which contains basically any finite number of elementary events, which have equal chances to occur before we introduce any kind of condition, right? So, it's equal chances before this is the probability. It's equal chances before we introduce any event. And this probability is conditional probability. So, no matter how we calculate the conditional probability from the standpoint, which I would consider more philosophical, of redistribution of the probabilities of elementary events or using just the plain formula using the old probabilities and conditional probabilities. Well, it's still the same formula is still true. So, conditional probability is expressed as unconditional probability in this particular way. Okay. Now, what's interesting is that I have just derived this formula in a relatively simple case of a finite sample space with initially equal distribution of probabilities among elementary events. Well, a really deep learning of theory of probabilities actually involves infinite sets of elementary events and obviously not necessarily even distribution of probabilities. And by the way, if my number of elementary events is infinite, we cannot have even distribution because the sum is supposedly equal to one, right? So, we cannot have the same probability, infinite number of elementary events and still have their sum equal to one, right? So, these are much more complicated cases. We are not going to consider these cases in this course. But however, what's interesting is that the concept of a conditional probabilities actually does exist in those cases as well. And in those cases, in those cases, this is basically a definition. So, in this case, my finite case with equal probabilities of elementary events, I kind of derived this formula. But generally speaking, this formula is just taken as a definition of the conditional probability. And it's really very easy to understand graphically. Here it is. So, let's consider this signifies my sample space. The points inside of this figure are elementary events. Now, this is something which I'm interested in. So, these are all elementary events, the probability of which I would like to know. So, if I know the measure of every elementary event, and I consider that the number of elementary events is finite, so I just don't put these dots, but just imagine that you have certain finite number of dots. And this figure encompasses certain number of dots. Each dot has a probability of 1 over n where n is the number of dots. So, this is my unconditional probability. Now, let's consider that I have a different event, call it b. And I know that this particular event always happens. So, what does it mean? It means that my probability is no longer evenly distributed among n dots here. It's concentrated only among the dots within this particular figure b. So, all the dots are here. And what's the probability of the event a in this case? How can I calculate it? Well, I have to add only the dots which are here in the intersection between these two areas. Because every dot which is here has a zero probability. It's outside of b. And I know that only the probability of the elementary events within b have certain non-zero measure, non-zero probability. So, I have to calculate only these guys. Now, I know that the distribution of probabilities is even within all of the dots, all of the elementary events within the b. So, what is my probability of a under condition that b always occurs? Well, basically it's the ratio between this area, which is area of the intersection divided by this area, which is basically the total probability of b. So, this particular graphical explanation of the conditional probabilities probably the most obvious. So, all you have to know is which part of the a belongs to b, which is the intersection. And what kind of a, what fraction of the entire b this intersection actually represents. And the ratio between this area and this area is a conditional probability of a under condition that b always occurs. So, that's what you always have to keep in mind. This is this formula actually is all about. So, if you know this, it will be very easy for you. If you know this graphical picture, it will be very easy for you to understand what actually the conditional probability is. So, the condition establishes a new field where the game actually occurs. Now, and everything which is outside can be just completely ignored these pieces. And only whatever is happening inside is important. So, what part of the inside of the b belongs actually to a, that's their intersection. By the way, what's obvious right now is that conditional probability of a under condition b is not equal to conditional probability of b under condition of a. And it's also very obvious graphically. So, if this is a and this is b, this is a intersections b. This one is equal to area of this divided by the area of b. And this one is equal to area of this divided by the area of a. And obviously, they are different generally speaking, right? Basically, that's it. That's all I wanted to tell about the conditional probability. I have exemplified it first and then I kind of generalized for all the cases with the finite probability of, with the final number, finite number of elementary events and equal probability of each of them. And the whole purpose of this lecture was to derive this formula. The formula for conditional probability. And again, remember that my derivation is only for this particular case. Generally speaking, this is, this formula is a definition, this formula is a definition of the conditional probability in the most general case, whatever is possible. That's it. Thanks very much. If you can try to read this lecture again in the notes on Unisor.com. And as usually, I encourage you to register on the website. And that would allow you to take the course basically, as in its entirety, including exams, where you can basically test how you are going, how you are absorbing this information. Thanks very much. Good luck.