 Hello everyone, this is Alice Gao. In this video, I will discuss how we can perform the filtering task through forward recursion. Recall the definition of the filtering task. Let's number the time steps starting from zero. Suppose that is day k today. Given the observations from day zero to day k, what is the probability that I am in a particular state today? Mathematically, what is the probability of s sub k given o sub zero column k? o sub zero to k represents a sequence of observations from day zero to k. The small o means that we observe these signals. The capital letter s means that we do not observe the state. Note that s sub k can be true or false. Therefore, this quantity is not a single probability. It is a distribution containing two probabilities. The probability of s sub k is true given observations and the probability of s sub k is false given observations. I've written the two probabilities in angle brackets to remind you that they form a distribution. We can perform filtering efficiently through forward recursion. Forward recursion allows us to go through the Markov chain once and calculate all the filtering probabilities along the way. Starting from time zero, the recursion passes a distribution or a message from the time step k to the next time step k plus one. Let's define a notation for this message. f sub zero to k is the probability of s sub k given o sub zero to k. Again, this message is not a single probability. It is a distribution containing two probabilities. To start, we must calculate the distribution at time zero. This is the base case. We can calculate this using the base rule. Recall that the alpha in the formula is a normalization constant. In this case, it is equal to one over the probability of o sub zero. Once we have the message f sub zero to k minus one, we can calculate the message f sub zero to k using recursion. Let's look at the recursive case. Given the message f sub zero to k minus one, we want to calculate the message f sub zero to k. There are two other quantities. The probability of o sub k given s sub k is from the sensor model. These are the probabilities for time k. The probability of s sub k given s sub k minus one is from the transition model. These are the probabilities of the state transition from time k minus one to time k. We have both of these quantities from our hidden Markov model. Also, we have a summation over the two possible states at time k minus one. Finally, alpha is again the normalization constant. We don't need to know what it is. It simply means that once we derive the two values, we need to normalize them so that they're valid probabilities. It is tricky to use a forward recursion formulas correctly. Let me work through a few calculation examples. Here's our umbrella model again. We'll need these numbers for our examples. First, let me calculate f zero to zero. This is the base case. Assume that the observation on day zero, which is o sub zero, is true. This example only requires you to apply the base rule. However, I want to go through it to show you how the normalization constant works. We need to calculate two probabilities. One for when s zero is true and the other one for when s zero is false. Let's calculate them separately. Write down the formula when s sub zero is true. Let's plug in the numbers and do some calculations. We have to stop here temporarily, since normalization requires us to calculate the sum of the two values. Let's calculate the other value. Now that we have both values, let's normalize them. The first probability is equal to the first value divided by the sum of the two values. The second probability is equal to one minus the first probability. I've highlighted the final answers. Once you're familiar with these calculations, you might want to calculate the values more quickly. You can do this by using the angle bracket notation to keep track of multiple values at a time. Let me show you an example. Let me start by writing down the base rule formula. Remember that each term is not a single probability. It is a distribution containing two probabilities. Let me write down the first term after alpha using the angle bracket notation. This term contains two values. The probability of o zero given that s zero is true and the probability of o zero given that s zero is false. Be very careful when filling in these values. It's really easy to use the incorrect ones. The second term also contains two values. This is a prior distribution over s sub zero. Next, we need to multiply these two terms together. This is an element-wise multiplication. The first value in the result is the product of the first values in both terms. Finally, we need to normalize both values. This step is much easier to visualize since we have both values in the same place. These are the final answers which are the same as before. Let's look at an example of the recursive case. We're given the message for time zero, f sub zero to zero. We want to calculate the distribution for time one, f sub zero to one. We assume that o sub one is true. The director brings an umbrella on day one. Let's calculate the two probabilities using the compact notation. It is a bit challenging to plug in numbers right away, so let me rewrite the formula to make it clear which numbers we need to plug in. First, let me replace the f values with the original probability notations. This makes it clear which variables are in the terms. Next, let's write out the two terms in the summation explicitly. The first term is for s zero is true, and the second term is for s sub zero is false. Third, let's write out two terms for s sub one explicitly. Every time s sub one appears in a term, we should write down two values in angle brackets. The first value corresponds to the case when s sub one is true, and the second term corresponds to the case when s sub one is false. At this point, every variable in any term is a small letter. This means that we have explicitly written out the values of all the variables. We're ready to plug in numbers. Here's a step with all the numbers plugged in. These are the final answers. The final answers make intuitive sense. After observing the umbrella on day zero, the probability of raining on day zero is high. About 82%. After observing the umbrella on both days zero and one, the probability of raining on day one should be even higher, and it is higher, about 88%. I've included step-by-step calculations with numbers on the next slide. Feel free to use it to double check your calculations. That's everything for this video. Let me summarize. After watching this video, you should be able to calculate the filtering probabilities by using forward recursion. Thank you very much for watching. I will see you in the next video. Bye for now.