 Hello everyone, this is Alice Gao. In this video, I'm going to describe the motivations of why we want to use a Bayesian network Then we'll look at an example of a Bayesian network for the home scenario and talk about its structure When I did a review of probability theory, I already introduced the home scenario So homes is worried about burglary happening Mr. Holmes is worried about burglary happening at his home. So he installed an alarm But he relies on his neighbors Dr. Watts and Mrs. Gibbon to call him to tell him that the alarm might be going off at his house And also he learned that the alarm might be triggered by an earthquake But if an earthquake is happening, it's very likely that there will be a radio report of the earthquake We talked about how to model this story using random variables and using probabilities So we defined six random variables for earthquake, radio, burglary, alarm, Watts and Gibbon And in order to represent the joint distribution, there are two to the power of six or 64 probabilities in the joint distribution So we need a minimal number of 63 probabilities in order to represent this distribution Also from the review from the last lecture, you should remember that Knowing the joint distribution is something very powerful We have enough information to do any kind of reasoning, any kind of probability inference We are able to calculate any prior or any conditional probability that we'd like So why don't we do that? Why don't we just write down this Giant joint distribution and use it to do inference and then that's it. That's the end of the story Why do we want to use Bayesian network instead? There are two main reasons One reason is that the size of the joint distribution grows very very quickly If we have n variables, then the size of the joint distribution is two to the power of n It's exponential in the number of variables So this exponential growth means that it quickly becomes intractable To calculate any probability as the number of variables grows So the computation is inefficient And the second reason is that it's difficult to write down this joint distribution in the first place Right? It's unnatural for us to think about a joint probability So if I tell you this story, then if I ask you to think about what is the joint probability that Earthquake is happening. There's a radio report of earthquake, burglary is happening, alarm is going off And Dr. Watson and Mrs. Gibbon are both calling Right? Technically we should be able to derive a number for that But for us, for humans, it's really difficult to think about what this number should be Because of both of these reasons, we want to think about a more compact way of representing the joint distribution We want to use fewer numbers to still have sufficient information to calculate probabilities to do inference These are the most important reasons that lead us to think about a Bayesian network So in short, a Bayesian network is a compact representation of the joint distribution And we are able to derive a compact representation because a Bayesian network takes advantage of the unconditional and conditional independence relationships among the random variables We've seen a little bit of this when I revealed the definitions of unconditional and conditional independence Here's a possible Bayesian network for the home scenario Let's first look at our savings in terms of the number of probabilities Remember that to specify the full joint distribution, we need At least 2 to the power of 6 minus 1, which is 63 probabilities Right? So what about this Bayesian network? Well, given this Bayesian network, we need a total of 1 plus 1 plus 2 4 plus 2 plus 2 Which is a total of 12 only 12 probabilities to specify the joint distribution So this is quite a bit of saving for us. This is a much more Compact representation of the same thing Now I haven't convinced you that this could actually Represent the same thing as a joint distribution, but we'll see that this is the case Another important point to realize is that there are many possible Bayesian networks that can represent the same scenario So this is only one example and I believe this is the best network in terms of Minimizing the total number of probabilities that we need But there are other possible networks depending on How we order the nodes and how we how we draw the directed edges We might come up with a slightly different network So a little bit later, I will spend some time talking about how can we construct a Bayesian network given a probability distribution And then we'll talk about How does this process of constructing the Bayesian network affect the structure of the network? So in other words, how can we come up with a network that has the fewest links so that we can minimize The number of probabilities we need to define the network On the next slide, I talk I'm talking about some Components of the Bayesian network, but it's more convenient for me to explain all those components on this slide So a Bayesian network is a directed acyclic graph. So we have these directed edges for example Right and the graph has to be acyclic, which means there can be no cycles in the graph We have nodes In this Bayesian network and each node is representing a random variable Then we are also representing the relationships Between these random variables. So in a way you can think about the Bayesian network like a family tree Okay, so In this family tree Let's suppose we are focusing on this node a So for node a has two parents b and e Are the parents of the node a in general a node can have any number of parents or it can have no parent at all Node a also has children. So in this case a has two children w and g We can also talk about ancestors and descendants in the Bayesian network. So for example if we're talking about Ancestors of the node w then the ancestors could be a So basically everything upstream from w, right? So a b and e Are all ancestors of w And we can also talk about descendants. So if we're talking about descendants of b, then we have a w and g These are all descendants of the node b All right, so It's like a family tree except the number of parents and number of children. They're not they're all not fixed They could range from zero to any number In addition to having nodes as random variables and then having directed edges Representing the relationships between the nodes. We also have some conditional probability distributions So for each node for example for node, let's look at node a for each node, we will have a conditional probability distribution where The probability we specify the probability of that node Given the values of its parents Okay, so this conditional probability distribution basically describes How do the parents affect? How do the values of the parent nodes affect the the probability of this current node? All right, so in this example because a has two parents So we have to specify how Does every combination of values for the two parents affect the probability of a So in the case where if we look at node b, for example, b has no parent So we simply need to specify a prior or unconditional probability of b because b does not have to condition On any parent node. That's everything for this video. Let me summarize After watching this video, you should be able to do the following Give two reasons for using a Bayesian network instead of using a joint distribution to represent a probabilistic model Describe the major components of a Bayesian network The nodes the directed edges and the conditional probability distributions Thank you very much for watching. I will see you in the next video. Bye for now