 Hi everyone, this is Alice Gao. In this video, I'm going to talk about solving the Bellman equations using the value iteration algorithm. This is a reminder of what the Bellman equations look like. Remember our goal. Our goal is to solve for the expected utility of the optimal policy, which is denoted by capital V. Now the Bellman equations encode the relationships between the expected utility of the optimal policy for different states. So for our grid world, what does this look like? In the previous video, I also showed you this concrete example of one Bellman equation. And you can see that in this equation, we have three unknowns, V of S11, V of S12, and V of S21. So for our grid world, we have nine non-goal states. So there are nine Bellman equations with nine unknowns in it. Given this, the solution sounds quite simple. We can take this system of nine equations and solve for the nine unknowns. Once we have the V values, then we can solve for the optimal policy given those. So are we done? Is that the end of the story? Well, as a computer scientist, we not only care about being able to solve a problem. We also care about being able to solve a problem efficiently. So let me ask you one question about efficiency. Is there an efficient algorithm we can use to solve the system of Bellman equations? This question is pretty tricky, and I actually don't expect you to be able to come up with the answer yourself. But I'd like to pose this question to get you to think about it before I tell you the story. So if you want to think about this yourself, here's a hint for you. First of all, look at the Bellman equations. I've given you an example here. I've also given you the grid world for your reference. So looking at the system of equations, can you determine whether the system of equations is linear or nonlinear? That's your first step. So once you've answered that question, then try to recall algorithms or techniques you know for solving linear and or nonlinear system of equations. Hopefully you can draw on your past knowledge to answer this question correctly. Think about this yourself, and then keep watching for the answer. The correct answer is no. There does not exist an efficient algorithm we can use to solve the system of equations. Here is why. First of all, is this system of equations linear or nonlinear? Well, if you notice, we have a max function in each Bellman equation, and the max function is not a linear function. So therefore having the max function makes the entire system of equations nonlinear. Now, given that, what kind of techniques do we have to solve system of linear equations and system of nonlinear equations? Well, hopefully you have taken courses, previous courses on linear algebra, maybe courses on linear programming. So if you have, then you would have known that there are many techniques, linear algebra, linear programming techniques we can use to solve system of linear equations efficiently. So that's not a problem for us, but our system of equations is nonlinear, and it turns out there is no technique we efficient algorithm that we can use to solve a system of nonlinear linear equations efficiently in general. We might be able to solve special cases efficiently, but not in general. So given this, what can we hope for? Well, if we want to directly solve the system of equations, we cannot do it efficiently. So instead, we are going to use a somewhat approximation approach or numerical approach. So we are going to solve the system of equations using an iterative approach. This leads to the value iteration algorithm. Let's take a look at the value iteration algorithm. Here are some key ideas of the value iteration algorithm. We will estimate the V values numerically. And this process will start with some arbitrary estimates and will iteratively improve our estimates using the Bellman equations. And finally, the algorithm will stop when the V values converge. So when I'm describing the updates and the iterative algorithm, I'm going to use the subscript i to represent the iteration. So V sub i is referring to our estimates of V for the ICE iteration. Let's look at the steps in detail. So first of all, we will start with some arbitrary initial values for V sub 0 of S. So we'll call the first iteration the 0th iteration. Arbitrary initial values can be anything. 0 is a pretty good choice. And then at the ICE iteration, what do we do? Well at the ICE iteration, we're going to update our estimates for the Vs. So if you look at this update rule, this update rule looks almost exactly like the Bellman equations, except for a few details. So here are the details you should notice. You should notice that on the right hand side, the V values we have are from the ICE iteration. So we have V sub i, whereas on the left hand side, the V values we have are for the i plus first iteration. So V sub i plus one. And also notice that instead of having an equal sign, we have an arrow pointing to the left. So how do we use this update rule? We should take the estimates for the ICE iteration and plug them into the right hand side, do the calculation, and the result of the calculation will give us our estimates for the i plus first iteration. Okay, so notice here that if you want to strictly follow this update rule, then on the right hand side, when you're plugging in values, you're always plugging in old values. Okay, so even if you have calculated the new values for some states, you should always plug in old values on the right hand side. Finally, when should the algorithm terminate? Or in this case, we're going to monitor the change in our estimate. And once the maximum change for our estimate for any state is small enough, then we'll terminate the algorithm. Basically, we're looking for the case when all of the values converge. This is the entire value iteration algorithm. According to theoretical results, if we do these Bellman updates for a sufficient number of times, then the V values are guaranteed to converge to the optimum values. This is everything for the value iteration algorithm. In the next video, I will do a few examples of the value iteration algorithm. So in particular, you will see some calculations for updating the V values. Thank you very much for watching. I will see you in the next video. Bye for now.