 Hello everyone, this is Alice Gao. In this video, I'm going to talk about performing probabilistic inference in a Bayesian network. Specifically, I want to explain why it is more efficient to compute the probability using the variable elimination algorithm than using the joint distribution. One reason for constructing a Bayesian network is to calculate probabilities. There are many probabilities that we may want to calculate. For example, for the home scenario, we may want to calculate the prior probability for the alarm to go off, without knowing anything else. Also, one day, both Watson and Gibbon call Mr. Holmes. Given both costs, Mr. Holmes may want to calculate the probability that a burglary is happening. We already have one way of calculating any probability. We can recover the joint probability distribution using the Bayesian network, and then use the distribution to calculate any prior or conditional probability. In the next video, I will introduce the variable elimination algorithm, which we can use to calculate a probability more efficiently. My goal in this video is to show you why the variable elimination algorithm is more efficient than calculating a probability using the joint distribution. Let's consider the Holmes Bayesian network again. Suppose that we want to answer this question. Given that Dr. Watson and Mrs. Gibbon both call Mr. Holmes, what is the probability that a burglary is happening? This seems like a question that Mr. Holmes would want to answer. To answer this question, we need to calculate the probability of burglary, given that Watson and Gibbons are both true. To write down the expression, I need to introduce some new notation. So far, we've used uppercase letters to represent both the random variable and its values. Capital B means that B is true, and not capital B means that B is false. Let's use a new notation to distinguish the random variable and its values. In the new notation, capital B represents the random variable only. Small b and not small b represent its values true and false respectively. Sometimes we may also use the small b to represent an undetermined value of the random variable. Using this notation, we want to calculate the probability of capital B given small w and small g. Let's divide the variables into three categories. First, the query variables refer to the variables that we're interested in. They are the ones that appear before the bar in the probability. For our example, there's one query variable, b. Second, the evidence variables are the ones for which we observe their values. For this example, w and g are the evidence variables. Third, the remaining variables are called the hidden variables, since they do not appear in the probability expression. For our example, the hidden variables are e, a, and r. To calculate this probability, we need to rewrite it using the quantities from the Bayesian network. Whenever we want to calculate a conditional probability, it's always a good idea to convert it into an expression involving joint probability. Let's use the product rule in reverse and write p of b given w and g as p of b n w n g over p of w n g. Next, let's rewrite the denominator to be in a similar form as a numerator. We'll use the sum rule in reverse and introduce b into the expression. p of w n g is equal to p of b n w n g plus p of not b n w n g. Given our derivation so far, calculating p of b given w n g is equivalent to calculating the joint probability of b w n g. This joint probability doesn't contain all the variables, so we cannot calculate it directly using the Bayesian network yet. Let's introduce all the other variables into the expression. We'll use the sum rule in reverse and introduce the hidden variables e, a, and r. Here's the sum. Each term in the sum is a joint probability over all six variables. Now, we're ready to use the Bayesian network to calculate the joint probability. The joint probability is a product of six terms. Each term is a conditional probability for one variable in the network. This expression illustrates how we can calculate the probability using the joint distribution. First, we'll calculate all the joint probabilities using the Bayesian network. After that, we'll sum out the hidden variables to get the joint probability over b w n g only. Finally, we'll use the product rule in reverse to calculate the conditional probability. Later on, I want to compare this approach to the variable elimination algorithm. To do this, I need to measure its efficiency. One way to measure its efficiency is to count the number of additions and multiplications we need to perform to calculate this expression. Here's a question asking you to count the number of operations. The purpose of this question is not to get the number exactly right. That's why the options are ranges rather than individual numbers. Rather, the purpose is to get a rough sense of how many operations are required to calculate this expression. If the number of operations seems large, then the approach is likely not very efficient. Pause the video and calculate this number yourself. Then, keep watching. The correct answer is D. We need to perform 47 operations. Let me talk through the process briefly. There are three nested summations over three binary variables. So we're summing up eight terms. In each term, we have to perform five multiplications. That's five multiplied by eight, which is 40 multiplications in total. After that, we need to add the eight terms using seven additions. So the total number of operations is 47. Let me make some observations about this expression. In fact, we have to perform many duplicate computations when we calculate this expression. For example, p of b appears in every term, but it doesn't involve any of the three hidden variables. You can think of p of b as a constant. For each of the eight terms, we need to perform one multiplication for p of b. That's eight multiplications. If we take p of b outside of all the summations, they will only need to perform one multiplication for p of b. The simple change could have saved us from doing seven operations. We can make a similar observation for e, a, and r. Take r as an example. Most terms do not have r in it. Yet, these terms are inside the summation over r. So we have to calculate the product of these terms twice, once for when r is true and once for when r is false. If we take these terms and take them outside of the summation over r, we only need to calculate their product once. So far, we can calculate the probability using the joint distribution. Let's find a way to calculate the expression more efficiently. The key idea is that we only want to perform a summation over terms that involve the variable being summed over. If a term does not involve the variable that we're summing over, we should take that term outside of the summation. For our expression, this idea is equivalent to moving the summation to the right as much as possible. These changes will help reduce the number of operations required. Let's try it out. The right-most summation is over r. The only term involving r is p of r given e. Let's move the summation all the way to the right until it is only over p of r given e. The next summation is over a. Let's gather all the terms involving a, p of a given b and e, p of w given a, and p of g given a. Put these terms together to the right as much as possible. Next, we'll move the summation to the right until it is over these three terms only. The final summation is over e. The additional term involving e is p of e. We will move the summation to the right until it is at the left of p of e. This is equivalent to taking p of b and moving it out of all the summations. There is one more simplification we can do. Take a look at the right-most term, the summation over r of p of r given e. What is the value of this term? Think about this for a few seconds, then keep watching. The right-most term is equal to 1 based on the definition of a conditional probability, so we can remove it from the product. Take a look at our final expression. If we use the variable elimination algorithm, we would end up calculating the probability using this expression. Let me ask you the same question as before. What is the number of additions and multiplications required to calculate this expression? Again, the purpose is not to get the number exactly right. This number gives us a rough sense of the algorithm's efficiency. If the variable elimination algorithm is more efficient than the previous approach, we should get a smaller number. Pause the video and calculate this number yourself, then keep watching. The correct answer is b. We need to perform 14 operations. Here's a brief explanation. For the inner sum, we need one addition plus four multiplications, so five operations in total. For the next sum, we need one addition plus one multiplication plus five operations per term for two terms, that's 13 operations in total. Finally, we need one more multiplication. The total is 14 operations. Again, the exact number is not the point here. Note that by moving the summations inward to the right, we reduce the number of operations by a lot, from 47 to 14. Let me summarize some important ideas. We want to calculate the probability. Our first approach is to calculate it using the joint probabilities. We begin by recovering the joint probabilities from the Bayesian network. Next, we'll sum up all the hidden variables one by one. This approach works, but is quite inefficient since it performs a lot of duplicate computations. To make it more efficient, the idea is to perform each summation only over terms that involve the variable being summed over. This is the high-level idea that leads to the variable elimination algorithm. We transform the expression using this idea. First, we chose an order of the hidden variables. For each hidden variable, we found all the terms involving the hidden variable and multiplied them together. Then, we summed out the hidden variable from the product. This is what we're doing by moving the summation over the hidden variable to the right as much as possible. Repeat this for every other hidden variable until all of them are summed out. This leads to our final expression. Calculating this expression is more efficient since it requires a much smaller number of operations. That's everything for this video. Let me summarize. After watching this video, you should be able to do the following. Calculate the probability by using the joint distribution. Calculate the probability by using the high-level idea of the variable elimination algorithm. Explain the high-level idea of the variable elimination algorithm. Explain why it leads to a more efficient approach of calculating the probability. Thank you very much for watching. I will see you in the next video. Bye for now.