 Hello everyone, this is Alice Gao. In this video, we are finally going to finish the mail delivery robot example. We are going to look at the robot decision network we came up with and decide on the best action that the robot should take to maximize its expected utility. This is a reminder of what the robot decision network looks like. We have four nodes, two of them are decision nodes, so our actions. Then there's an accident, which is a random variable that depends on whether we take the short route or the long route. And then all three variables collectively will influence the robot's utility. Let's first look at a general procedure to choose an action given a decision network. This is a extension of the inference algorithm that we did with the Bayesian network. So a lot of things will be similar to the ideas of the variable elimination algorithm. So first of all, if we have any evidence variables, if there are random variables for which we observe their values, then we should set them. We should set these variables to the observed values. Next, we'll think about all the possible decisions or actions we can take. So for each possible value of a decision node, we'll do the following. We're going to set the decision node to that value. This is similar to how we set the values of the evidence variables above. Then we will look at the utility node. There is typically only one utility node at the end, right? Because this is modeling the preference of one agent. So we're going to look at the parent nodes of the utility node, basically which nodes influence that utility node, and calculate the posterior probabilities for those parents. Going back to our decision network, right? In our decision network, we don't have any evidence variables. And then we have two decision variables. So in this case, we're going to set both decision variables to some value, some of the possible values. And then for each parent of the utility node, there are three parents. For each parent, we're going to calculate their posterior probabilities, right? For short and past, we don't have any posterior probabilities, but for accident, we will have a posterior probability depending on what is the value of short. So then the next step is that we are going to calculate the expected utility for the action, right? Again, this is related to how the utility is influenced by possibly many nodes, some of them are random variables, some of them are decision variables. So coming back to our network, we will have a posterior probability for accident, and then we will have our decision for short and our decision for paths. These together will decide on our expected utility in that particular state of the world, right? Notice here, it is expected because a random variable may introduce uncertainty in this case, right? But short and past do not introduce any uncertainty. There's decisions we make. And once we make these decisions, it's done, it's for certain, right? But here, accident will introduce some uncertainty. So that will cause us to only be able to calculate an expected utility and not our actual utility. So coming back to this procedure, think about this middle big block as that we have multiple decision nodes and for each possible value combination of these decision nodes, we are calculating the expected utility for taking that combination of actions, right? And once we do that, then finally, all we need to do is compare these actions in terms of their expected utility and pick the one that has the highest expected utility, right? So if I were to summarize this entire procedure in one sentence, that will be for each set of possible decisions we can make, calculate the expected utility of those decisions and then choose the set of decisions that maximizes our expected utility. That's it for how we evaluate the decision network to make a decision. Now let's do this for the robot example. Coming back to our robot example, we have two binary decisions to make, so there are four possible combinations. I'm going to show you how to calculate the expected utility for one combination and then I've left empty slides, their space for you to calculate the expected utility for the other three actions. We do have to calculate the expected utility for all four actions before we can decide on which one is the best. Let's look at the first of the four calculations. So for the first one, we are trying to calculate the robot's expected utility of not wearing pants and choosing the long route. So not wearing pants and choosing the long route. EU stands for expected utility. So in order to calculate the expected utility, we have to think about the following. So the expected utility is always a summation. The summation is over all of the possible worlds that could be observed given the situation. Now this is the case when we don't wear pants and we choose the long route. Given this situation, there are two possible worlds. One in which an accident does not happen and there's another one where an accident does happen. In the table, these two are labeled as W0 and W1. So that's why first I wrote down, well, given the situation, what is the probability that W0 happens and in that world, what is our utility? Plus, what is the probability that W1 happens given not wearing pants and long route multiplied by the utility of that the robot has in W1? So in general, expected utility is calculated as a summation over all the possible worlds. For each possible world, the term is the probability that world occurs, where that world happens, multiplied by our utility in that world. And then in the next line, I simply wrote out all the detail of each world. So W0 is referring to no pants, short, long route and accident does not happen and W1 refers to no pants, long route and accident does happen. Nothing else changed except that. And then next, we are looking to simplify the two very long conditional probabilities. And one thing you can notice is that the conditional probability has some duplicate numbers in there. If we are already conditioning on not wearing pants, then we don't really need to calculate the probability of not wearing pants, because that's what, right? If it's given that we're not wearing pants, then we're not wearing pants. And similarly, if we choose the long route, then we choose the long route, right? So you can see these two terms, if they are already given on the right, then it's redundant to also contain them, also have them on the left, right? So that's why I simplify the probabilities below by removing the parts on not P and not S, because they're redundant, their value is already given, right? After simplifying, we are looking to calculate the probability that an accident does not happen in the situation and the probability an accident does happen in the situation, right? So our question is, how do we calculate the probability of an accident? Next, we are going to take advantage of kind of a conditional independence assumption or conditional independence assumption, right? Remember in our decision network, let's go back to it briefly. In our decision network very early on, when we're constructing the network, we decided that wearing pants or not is not going to influence accident in any way, right? Only short is going to influence accident. Therefore, P and A are completely independent. Given short, they're also independent, right? They're just independent in every possible way. So because of that, in this expression, since we know we're choosing the long route, so given that P and A are independent, that's why we can simply remove the not P part in the expression and we end up with probability of not A given not S and probability of A given not S, right? Because P is irrelevant. It doesn't influence how likely the accident is going to happen, okay? So finally, we have simplified the probability to a form where we can directly read it off from the decision network, right? So based on the decision network, if we choose the long route, then the probability that an accident does not happen is one and accident never happens on the long route. And similarly, if we choose the long route, the probability accident happens is zero, okay? For each probability, we need to multiply it by the utility in that world, or for W zero, our utility is six. For W one, that world is impossible. So I put a dash here to denote that, but luckily that doesn't cause us any problem, right? Because since that world is impossible, the probability of observing that world is also zero. So we don't need to worry about the fact that we don't have a number associated with that world. So with the final summation, we get our expected utility for not wearing pads and choosing the long route is six. Now, the next three slides are blank and I've left space for you to do the calculations for the other three cases. You need to do all of them in order to answer the final question of which action, which combination of decisions is best for the robot. So take some time to do the calculations. They are not super difficult and they're quite similar to what I've done for this case. I'm not going to discuss the solutions to the other three cases in this video. You can check the solutions in the annotated slides. One thing to mention here is that some of the expressions will involve Q because Q is the probability that an accident happens if we're on a short route, right? So it might be a function of Q and that's okay. I've copied the four expected utility here for your reference. Now, we're ready to answer the final question which is the goal of having this example in the first place. So what should the robot do? Should they wear pads or not? Should they choose the short route or the long route? Well, at the first glance, it's not very clear because some of the expressions involve Q and Q can be any number from zero to one, right? It's a probability. But some options are clear. For example, the option of wearing pads and choosing the long route is strictly dominated by the option of not wearing pads and choosing the long route because if we're on the long route, there's no possibility of an accident. So we might as well not wear pads, right? Why do we want to wear pads to decrease the severity of the damage when no damage could possibly occur, right? So the third one is the dominated option. But besides that, it's not clear how to determine which other actions is the best. But fortunately, we have some mathematical tools. Since we don't know what is Q, let's draw a graph. Let's graph these expected utilities with respect to Q where Q is a number from zero to one. And let's see if our preference would change based on the value of Q. At this point, you might want to pause the video and try to draw this graph yourself. And then once you finish drawing the graph, try to reach a conclusion yourself before you keep watching to see my answers. Here are the answers. So let's look at the graph. I drew all four lines for the region from zero to one because Q can only be within this region, right? And I labeled all four lines based on whether it's wearing pads or not and whether it's long route or short route. So how do we decide on what are the best actions? Well, this is going to depend on the value of Q. So for each possible value of Q, we want to look for the line that's the highest, right? Because we want to maximize the expected utilities. So the higher the line, the better. So that's why on this picture, well, for the smaller values of Q, it looks like this region is good, right? So this line is correspond to no pads and short. And then for the larger values of Q, looks like the part of this line is good. So no pads and long is a good idea. All right, so then the tricky thing about completing this answer is figuring out this intersection, right? Once we figure out this intersection, for all values of Q less than this value, we have one policy. So one set of actions is the best. And then for values of Q greater than this value, another set of action is the best. We can solve for this intersection by setting, it's an intersection between 10 minus 10 Q and six. So this gives us Q is four divided by 10, so two over five. In decision theory, we often refer to the solution to this kind of problem as a policy, right? The policy tells us what to do in different situations. So our optimal policy is as follows. If Q is less than or equal to two over five, or about 0.4, then the robot should not wear any pads and choose a short route. If the probability of an accident is larger than two over five, then the robot should still not wear pads and choose the long route. So what can we learn from this optimal policy? Let's look at it and see if we can get some intuitions. Well, in terms of choosing between the long route and the short route, it seems like if the probability of an accident is small, then we prefer the short route. Then, sorry, then we're okay with choosing the short route. If the probability of an accident is large, then we prefer the long route where we're safe from having any accident. That's pretty natural, pretty intuitive, right? If it's unlikely for an accident to happen, then we can take some risk and take the short route. Now the last intuitive part is with respect to wearing pads or not. So intuitively, if we choose the short route, then it might be a good idea to put down pads just so that we can decrease the severity of the damage if an accident happens. But in our optimal policy, it seems like we never want to wear pads. One reason for this could be that first of all, maybe when the probability of an accident is small enough, then that will counterbalance, counter-react, the potential damage we can get. If the probability is very small, then even if we can get damage a lot, we don't really care, right? The end result shouldn't be very bad. And another possibility is that the potential damage might not be large enough, right? So if the damage is not large enough, also the probability of an accident is pretty small, then it's never worth it for the robot to wear pads to try to reduce the severity of the damage. That's everything for this video. After watching this video, you should be able to calculate the expected utility of an agent given a set of decisions. And then given the expected utility for each set of decisions or actions, you should be able to determine an optimal policy. Thank you very much for watching. I will see you in the next video. Bye for now.