 Hello everyone, this is Alice Gao. In this video, I am going to talk about how we can construct a Bayesian network. In particular, I am going to focus on talking about how do we create the links or the directed edges in the Bayesian network. At the start of this unit, I introduce the home story. And then based on that story, I showed you this Bayesian network you can see right now, which is supposed to represent the home scenario. I also mentioned that this network is not the only one that we can use to model the scenario. There are in fact many possible Bayesian networks we can come up with that are all valid models of the scenario. This is in fact true in general. So if we are given a particular joint probability distribution, then we can come up with multiple correct Bayesian networks. So what do I mean by correct here? A Bayesian network is correct if the independence relationship that the network represents actually exists in the joint probability distribution. So the independence relationship here can refer to an unconditional independence relationship or a conditional independence relationship. In other words, if two variables satisfy some sort of independence relationship based on the Bayesian network, then they better also satisfy the same relationship in the joint probability distribution. As long as this property is satisfied, we say that the Bayesian network is correct. Now, since there are multiple correct Bayesian networks for a particular scenario, we have to choose among them, right? We have to compare them and see which one is better, which one should we choose. Well, it turns out that we often prefer a Bayesian network that requires fewer probabilities. And in general, the number of probabilities required is correlated with the number of directed edges we have. So the fewer edges, the better. And the reason for preferring fewer probabilities is that that will result in a smaller Bayesian network. It's more compact and easier to work with. Let me talk about a procedure we can follow to construct a correct Bayesian network given the probability distribution. After talking about this procedure, I'll discuss three examples to show you different ways to apply this procedure. To start, given a particular story or the domain, we need to come up with a set of variables to model the important things in this domain. This really depends on how we interpret the story and what we decide to be the important things that we need to model as random variables. Once we choose the variables, then we need to order the variables in some way. This ordering is important because we're going to take the variables and add them to the network one by one. And based on this ordering, every time we're adding a variable, we will decide on how to create edges from the existing variables to connect them. So for every variable in the ordering that we picked, we're going to do a couple of things. First of all, choose the node's parents. Secondly, create a link from each parent to the current node that we're adding. And finally, write down the conditional probability table. So the conditional probability of the current node depending on its parents. I won't really talk about the third point. The third point will come up when we talk about how to learn the probabilities in the Bayesian network based on data. But I'll focus on talking about the first point, which is how can we choose a node's parents when we're adding that node to our Bayesian network. When we're adding a particular node, we need to decide on the parents of that node. And we are going to choose the parents among the existing nodes, so the nodes that are already inside the Bayesian network. The parents of the node to be added will be a subset of the existing nodes. So which node should be our parents? This is one of the rare opportunities where we get to choose our parents, so we should do so carefully. Here is the rule for choosing a set of parents. So among the existing nodes, we are going to try to choose the smallest subset of nodes from these nodes. And this smallest set of parents needs to satisfy the following conditions. That is, given this set of parents, the current node is independent of all other nodes. In other words, the parents of this current node need to be the smallest set of nodes such that the current node is conditionally independent of the rest of the node given these parents. So formally, if we are calculating the probability of Xi, Xi is the current node that we're trying to add, then conditioning on the parents of Xi is the same as conditioning on all the existing nodes inside the Bayesian network. Notice that in the extreme cases, the set of parents could be empty. So which means this node does not need to depend on any existing node. The set of parents could also be the same as all of the existing nodes, which means this new node we're adding depends on all of the nodes that we have added to the network. Let's look at some examples. I have three examples and these examples incidentally correspond again to the three key structures I've discussed earlier about Bayesian networks. So we're going to take all these structures and we are going to change the order in which we add nodes to the network. And in some cases that this will cause us to come up with a network that looks kind of different. In this video, I'm only going to tell you the answer for each one. And then in the separate video, I will discuss the reasoning behind behind each answer and how I came up with it. For this first example, let's consider the Bayesian network where we have burglary, alarm, and Watson and they form this chain. Now we will try to construct another correct Bayesian network based on a different variable ordering where w will come first and then a and finally b. Give it a try yourself and then keep watching for the answer. Here's the correct answer. If we change the ordering of the variables, then we come up with a network that looks kind of similar. So the resulting network is still a chain and the main reason behind this is that w, b, w, and b are independent given a. So no matter how we invert the order of the variables, we still get a chain. Let's look at the second example. The second example, we have alarm, Watson, and given and recall that you can think about Watson and given as noisy sensors of alarm. So given these three variables, let's try to construct a correct Bayesian network based on the variable ordering where w comes first, then g, and then finally a. Think about this yourself and then keep watching for the answer. Here's the correct answer. One important difference between the two is notice that we have this additional link from w to g. And the reason is that we are adding w and g much earlier, especially before we're adding a. So when we're adding g, g is not independent of w based on the original Bayesian network. So we need this link from w to g. Let's look at our third and final example. This example is about earthquake, burglary, and alarm. And earthquake and burglary jointly cause alarm to happen. So let's try to construct a correct Bayesian network where the variables are ordered as a comes first, then b comes next, and finally e comes last. Think about this yourself and then keep watching for the answer. Here's the correct answer. And similar to the second example, notice that we have one additional link from b to e. And the reason here is that when we're adding e as the last node to the network, we know that we cannot choose a to be only parent because given a, e and b are not independent. And we also cannot choose b to be the only parent. Therefore, we have the truth both a and b to be the parents of e. And therefore, we have this additional link. Now there are two important points I want to discuss here. The first one is that when you're looking at the correct answer right here, you might be wondering, the links here look really unintuitive. For example, why is there a link from burglary to earthquake? Burglary is certainly not a cause for earthquake, right? So that doesn't look like it's representing a causal relationship. If you're thinking this, then you'd be right. Not every link in the Bayesian network is representing a causal relationship for the original Bayesian network that I showed you which models a home scenario. It just so happened that I constructed the network in a way such that I always added the causes before I added the corresponding effects. So all of the links represent causal relationships. But this is generally not the case. In general, all of those links can be reversed. So every link is representing some correlation, but not necessarily causality, not necessarily a causal relationship. This is the first important point. Now the second important point or rather observation is that, for example, two and three, you should have noticed that by changing the order of the variables, we had to have more links in the networks, right? So we went from having only two edges to having three edges. So this results in a more complicated Bayesian network. So you might be wondering, okay, earlier we talked about we are going to prefer networks that have fewer links so that we need to specify fewer probabilities for these networks. That's cool. But how can we come up with a network that have fewer links? Should we just do this by trial and error? Or there's some general rule that we can follow to come up with a more compact network. Turns out there is sort of a hand wavy rule, which is that you should always try to pick a variable ordering that respects the causal relationship. So if there's a causal relationship between the nodes, then always try to add the causes first to the network before you add the corresponding effects. By choosing a variable ordering in this way, we tend to get a smaller and more compact Bayesian network. That's everything for this video. Let me summarize. After watching this video, you should be able to do the following. Describe what it means for a Bayesian network to correctly represent a joint probability distribution. Construct the Bayesian network based on a probability distribution and the ordering of the variables. Explain a heuristic that we can use to construct a smaller, more compact Bayesian network. Thank you very much for watching. I will see you in the next video. Bye for now.