 Hello everyone, this is Alice Gao. In this video, I'm going to talk about performing inference using the joint probability distribution. The joint probability distribution is something really nice to have because if we have it, then we have all of the information that we ever need to calculate any probability that we're interested in over any subset of the variables. Unfortunately, in practice, we usually do not have the joint distribution because the joint distribution is very expensive to create. Sometimes we don't even know how to produce it. But in theory, it's nice to have a small example where we can look at the joint distribution so that I can discuss different ways of manipulating the probabilities. So in this section, I'm going to talk about two different rules that we can use to calculate probabilities. One is a sum rule and the other one is the product rule. If you've taken a relevant stats or probability course, then this should be a review for you. Let's consider our home scenario and let's simplify the scenario a bit. So we're only considering three random variables. The first one is whether the alarm is going off. The second one is whether Dr. Watson is calling and the third one is whether Mrs. Gibbon is calling. So we're ignoring all the other variables. In this table, I am representing a joint distribution over these three variables. So how do you interpret these numbers? For example, 0.054 here, this is a probability where the alarm is not going, Mrs. Gibbon is calling, and Dr. Watson is not calling. So you can interpret each of these eight numbers in a similar way. One good check to do when you're looking at a probability distribution is that you should always check all of the numbers sum up to 1. And this example, that should be the case if I constructed the example correctly. So given this joint distribution, let's think about two tests that we can do. The first test is how do we calculate the probability over a subset of the variables? And the second test is how do we calculate a conditional probability? So the probability of one variable given that we know the probability of another variable. Let's think about the first test first. We want to calculate the probability over a subset of the variables using the joint distribution. And the rule that we can use is called the sum rule. So, the idea of the sum rule says that, well, if we only care about a subset of the variables, then we can simply sum out every variable that we do not care about. So what do I mean here by summing out? Let me show you a couple of examples to illustrate this. Suppose that we have a joint distribution over three variables, a, b, and c. Suppose that we want to calculate the probability that a is true and b is true. Note here that I'm using the shorthand notation where just writing a means a is true and just writing b means b is true. Of course, these variables we are considering are all Boolean random variables. So we can calculate this probability by summing out c. Here is the expression that we can use. So the probability of a and b is equal to the probability that a, b, c are all true plus the probability that a and b are true and c is false. Notice that we are varying the values of c while fixing the values of a and b. We care about a and b both being true, so we're fixing that, but we don't care about the value of c. So we're varying that and adding up all the terms of all possible values for c. Let me show you another example. For the second example, we only care about a. We want a to be true and we don't care about b and c. So we want to sum out both b and c. Well, how do we do this? We have two variables that we don't care about and for two Boolean variables, we have four possible combinations of values. So there are four terms we need to consider. In these four terms, we are fixing the value of a to be always true, but we're varying the values of b and c and considering all four possibilities. So all the possibilities are both being true. One is true. The other one is false and then both being false. To summarize, how do we calculate the probability over a subset of the variables using the sum rule? Well, to do this, we need to add up a bunch of probabilities and in all of these probabilities for the variables that we care about, we are going to fix their values to whatever values that we care about. And then for those of the variables that we do not care about, we are going to vary their values. And in particular, we will consider all possible combinations of values for the variables that we do not care about. For the next three slides, I have three quicker questions to help you practice the sum rule. In this video, I'm only going to show you the solutions, the correct answers. And in a separate video, I will show you the process by which I got the correct answers. So the first question here is what is the probability that the alarm is not going and Dr. Watson is calling? So we want to calculate the probability of not A and W. And the correct answer is 0.36. So it's A. Question number two, we want to calculate the probability that the alarm is going and Mrs. Given is not calling. So the probability that A is true and G is false. The correct answer here is 0.06, which is option B. Okay, question number three, we want to calculate the probability that the alarm is not going. So this is the prior or the unconditional probability that A is false. And the correct answer here is 0.9. So option E. You can note from here that in our model, at least in this model, the alarm is unlikely to go off, right? So alarm going off is a rare event. These are all the practice questions for the sum rule. Let's now move on to the product rule. One way we can use the product rule is to calculate a conditional probability. So when I talk about a conditional probability, I mean the probability of one variable, say called A, condition on knowing the value of another variable, say B. And to calculate this, we can use the product rule. Suppose that we have a joint distribution over A, B, and C again. And we want to calculate the probability of A given B. So we can do this using a form of the product rule as follows, where the probability of A given B is a fraction. And the numerator of the fraction is the joint probability of A and B. And then the denominator is just the probability of B. Just a reminder that in this case, we only care about the case when A and B are both true, right? So this formula I'm showing you only applies for the case when they're both true. You can derive similar formulas when one of them is false, or both of them are false. So you might look at this formula and think, this doesn't look like a product. Why is this called a product rule? That's because you can rewrite this formula in a different form, which looks more like a product. So if we take the denominator and move it to the left, then we get the following formula, where P of A and B is equal to the product of P of A given B and P of B, right? This looks much more like a product, except we rewrote that formula a little bit because we want to get the conditional probability. And also notice that in order to calculate the conditional probability, we need the joint probability over A and B, and we also need the prior or unconditional probability over B, right? But both of these are not directly given by the joint distribution. But that's okay. We just learned the sum rule, and using the sum rule, we can calculate both the numerator and the denominator. So calculating this conditional probability is going to be a little bit more work than before, where we have to apply the sum rule first to get the components that we need and then calculate the fraction. So how can we interpret this fraction? Well, the denominator means that in order to calculate this conditional probability, we only care about those worlds in which B is true, right? We don't care about the worlds in which B is false. So dividing by the denominator means we are going to rule out all possible world where B is false. I will only focus on the world where B is true. And then the numerator, well, within all of the worlds in which B is true, we want those worlds in which A is true, right? So that means the worlds that we actually care about have the property that A and B are both true. So that's why the numerator is P probability of A and B both being true. For the next two slides, again, I have two practice questions for you to practice using the product rule. I will include the process in a separate video, and in this video, I will only tell you the solutions. This is the first question. We want to calculate the probability that Dr. Watson is calling given that the alarm is not going. I've given you the joint distribution again, and I've also given you some probabilities we've calculated before in case they're useful. So the answer for this is that we want to calculate the probability of W given not A, and the correct answer is 0.4. So B is the correct answer. Here's the second question. We want to calculate the probability that Mrs. Gibbon is not calling given that the alarm is going. In this case, we want to calculate probability of not G given A. The correct answer here is 0.6 or option C. That's everything for this video. Let me summarize. In this video, I talked about two ways of performing inference using the joint probability distribution. After watching this video, you should be able to do the following. Calculate the probability over a subset of the variables using the sum rule given a joint distribution. Calculate a conditional probability using the product rule given a joint distribution. Thank you for watching. I will see you in the next video. Bye for now.