 Hello everyone, this is Alice Gao. In this video, I'll start discussing a new topic, reasoning and decision-making under uncertainty. So far, we've considered a world without uncertainty. In search and supervised learning problems, all the information is available to us, and we can use the information to perform inference and to make decisions. Why do we need to think about uncertainty in the first place? First, when an agent is navigating the world, the agent cannot observe everything. The information that's not available to us constitutes a large part of our uncertainty. Second, the agent, whether it's a robot or a program, it's imperfect. When the agent performs an action, the action may not produce the intended consequences. This leads to another possible source of uncertainty. Even though we're in a world with so much uncertainty, we need to take actions. We often have to make a decision based on missing, imperfect, or noisy information. The best thing we can do is to quantify our uncertainty about the world and use our uncertainty to perform inference and to make decisions. A student once asked me a philosophical question about making decisions under uncertainty in real life. The question was, suppose that I make an important decision and the outcome turns out to be not as good as I hoped. If I could go back in time, would I make a better decision? This type of what-if question can really keep a person up all night. What is your opinion on this? Here's my personal opinion. I believe that even if I can go back in time, I would still make the same decision. The reason is that, at the time, I did not have as much information as I do now. I made the best decision I could given the limited information that I had at the time. So even if I could go back in time, I would still make the same decision anyway. This is just my opinion. I would be curious to see what you think about it. Next, let me discuss two views of probabilities. We'll leave in a world full of uncertainty. Formally, we can use probabilities to measure uncertainty. When it comes to probabilities, there are two camps called the frequentists and the basians. The frequentists believe that probabilities are objective, something we can observe in the world. If we want to calculate a probability of an event, we can count the number of times an outcome of the event occurs and use that count to calculate the probability. For example, suppose that we want to know the probability that a coin will turn up heads if we do a coin toss. A frequentist will do many coin tosses and determine the probability as the fraction of heads observed. Basically, the probability is based on historical observations. The advantage of this view is that since probabilities are observable, it's something that we can agree on. The disadvantage is that a frequentist cannot estimate the probability if they haven't observed anything. In contrast, a basian thinks about probability as something subjective. A probability is just a degree of belief in our head. Basians believe that we have some prior belief about the probability based on our previous experience. As we make observations, we'll update this belief based on these observations. Consider the coin flip example again. Before observing any coin flip, two basians may already have different prior beliefs about the probability for the coin to turn up heads. These prior beliefs are based on their different prior experience. In general, different people can have different probabilities in their head for the same event. As the two basians observe some coin tosses, they will update their beliefs. The more coin flips they observe, the similar their updated beliefs will be. The advantage of the basian view is that we can have a probability estimate even if we haven't observed anything. The disadvantage is that two people may not agree on what a probability should be if they have different prior experience. In this course, we are going to adopt the basian view of probability. In order to use probabilities to model the world, we will need to come up with random variables. Each random variable describes an event. It has a domain which contains all the values that the random variable can take. There is an associated probability distribution which specifies a probability for each value of the random variable. Here's a simple example. Suppose we're modeling a scenario involving burglary and alarm. If a burglary happens, then the alarm may go off. One random variable could be whether the alarm is going off or not, and the domain would contain true and false. The distribution could specify that the alarm does not go off 90% of the time. In this unit, we will primarily deal with Boolean random variables. Each Boolean random variable can be either true or false. For Boolean random variable, we will use a simplified notation. We'll use a and not a to denote a is true and a is false respectively. There are some common properties that all probability functions satisfy. They're called axioms. We'll focus on probability functions that satisfy the three axioms below. The first axiom says every probability is between 0 and 1. Interestingly, the choice of 0 and 1 is purely a convention. Any other number range would also work. We needed to restrict these values to a range so that they're comparable in different scenarios. The second axiom says if something is true, then its probability is 1. If something is false, then its probability is 0. The third axiom is the inclusion exclusion principle. We can calculate the joint probability of two events, a and b, using the given formula. When we add the probabilities of a and the probability of b, we double counted the probability that a and b both happen, so we need to subtract that. All of the probability functions we consider satisfy these axioms. Let me introduce two more terminologies. Remember that we adopted a Bayesian view of probabilities. As a Bayesian, we have a prior belief and we'll update this prior belief based on observations. When we have a probability with no evidence, it is a prior or unconditional probability. This is our probability estimate of x when we haven't made any observation about x. Once we make new observations, for example y, we will update our belief about x given the observation. p of x given y is called a posterior or conditional probability. These two terminologies go hand in hand. We have a prior or unconditional belief before observing anything, and we have a posterior or conditional belief after getting some observations. In this unit, I will use a story about Sherlock Holmes as a running example. Mr. Holmes lives in a high crime area where burglaries are frequent. Because of this, he installed a burglary alarm. However, Mr. Holmes is not always at home and he relies on his two neighbors to call him when the alarm is going off. Mr. Holmes has two neighbors, Dr. Watson and Mrs. Gibbons. Unfortunately, the two neighbors are not perfectly reliable. Dr. Watson is known to be a tasteless practical joker and Mrs. Gibbons has an occasional drinking problem. Furthermore, the alarm is sensitive to earthquakes, which may happen once in a while. If an earthquake happens, then it will surely be on the radio news. I will use this running example over and over again to discuss different concepts and algorithms. Our first task is to model this story using a Bayesian network. To do that, we have to define some random variables. For this story, it's sufficient to use Boolean random variables only. Once you come up with random variables, determine the total number of probabilities there are in the joint distribution. Pause the video and solve this yourself. Then keep watching for the answer. I came up with these random variables. Don't worry if you came up with different ones. Modeling a real-world scenario is a challenging task and it takes a lot of practice before you can do it well. In this case, I have six random variables for burglary, alarm, Dr. Watson cause, Mrs. Gibbons cause, earthquake, and radio. So how many probabilities are there in the joint distribution? It's 2 to the power of 6, which is 64 probabilities. We need at least 63 probabilities to represent the joint distribution. The last probability is 1 minus the sum of the first 63 probabilities. Later on, we will construct a Bayesian network to describe the story. You will see that the Bayesian network representation requires fewer probabilities to describe the story. That's everything on introduction to uncertainty and probabilities. Let me summarize. After watching this video, you should be able to do the following. Explain why we need to model uncertainty. Explain terminologies such as random variables prior or unconditional probability and posterior or conditional probability. Given a story, define random variables to model the important elements in the story and calculate the number of probabilities in the joint probability distribution. Thank you very much for watching. I will see you in the next video. Bye for now.