 So, in the previous lecture we were we had just concluded the form for the value function at time n minus 1. What we found was that the value function at time n minus 1 took a quadratic form with a trailing additional constant. This constant was not present in the in the in the value function at time n. At time n we had a pure quadratic like this. Now, we have a trailing constant of this form and but nonetheless the for the it is still a quadratic function and the the the Hessian of that quadratic function that means k n minus 1 here this k n minus 1 is given in this particular form. This this k n minus 1 is a positive semi definite matrix which we change this is k n minus 1 is positive semi definite which means j n minus 1 of x n minus 1 is a convex is convex in in x n minus 1. This is what we we have concluded. So, now let us let us let us think take a step back and think what we have done. We started off at with time with this terminal condition here that j n my j n is equal to the terminal cos. We then went to time t equal to n minus 1 and we applied the dynamic programming equation to find find j n minus 1 of x n minus 1. Now, j n was assumed to be a convex convex and quadratic j and then from there we got that j n minus 1 is also a con is also convex and quadratic. Now, imagine what would happen if we get that if we if we did this now for t equal to suppose we did this now for t equal to n minus 2 at t equal to n minus 2 we would again have we would again attempting to write j n minus 2 of x n minus 2 this would be the minimum over u n minus 2. Now, again the expect we would have an expectation of x n minus 2 transpose q n minus 2 x n minus 2 plus u n minus 2 transpose r n minus 2 u n minus 2 plus plus now the the the cost to go at time n for time n minus 2. So, that is j n min at time n minus 1. So, that would be j n minus 1, but j n minus 1 would have to be expressed as a function of x n minus 2 and that could would give us a n minus 2 x n minus 2 plus b n minus 2 u n minus 2 plus w n minus 2 notice the form that we have we have here this here is had there is striking similarity with with the expression that we have out here out in when we wrote out j n minus 1. So, here j n here was a was convex and quadratic on the other hand now j n minus 1 here is convex and quadratic. And it has been expressed as a linear function of x n minus 1 u n minus 1 and w n x n minus 2 u n minus 2 and w n minus 2 here it was j n which was convex and quadratic and it was being expressed as a linear function of x n minus 1 u n minus 1 and w n minus 1. So, once so we have this here is a convex quadratic these two terms were also convex and quadratic just like these two terms were convex and quadratic. So, because of the quadratic nature of the problem we realized that the optimal U can be found convex and quadratic nature of this as a function of U we realize that the optimal U can be found by differentiating and putting the gradient with respect to U n minus 1 equal to 0. And that because again of the convex and quadratic nature because of the quadratic nature of the problem gave us that U n minus 1 star is a linear function of x n minus 1. So, now notice that these structural conclusions will continue to hold even in even when we do this at t equal to n minus 2. In other words if we now you know take this expand out this particular expression j n minus 1 we would again get terms like these terms that do not depend on W n minus that do not depend on W n minus 2 at all then we would get a term like this which is linear in W n minus 2 but whose expectation will be 0 because W n minus 2 is also 0 mean we would get a quadratic term in W n minus 2 which does not depend on anything either U n minus 1 or x n minus 1. So, what we would be left with therefore is would be a quadratic expression in U n minus 2 when we minimize that and that expression would also be convex would be convex in addition to being quadratic. And again we would be able to put that gradient with respect to U n minus 2 equal to 0 and then again conclude that U n minus 2 has to be the optimal U n minus 2 U n minus 2 star we would conclude from here that U n minus 2 star is l n minus 2 times x n minus 2. In other words all the conclusions that that that we derived from at time n minus 1 will continue to hold even at time n minus 2 from here then we would be able to do the same for time n minus 3 as well. In other words we and then therefore at n minus 4 and so on in other words we will get for all times t for all t we would get that mu star t of x t is equal to l t times x t this is what we would get and what would be this l t well this l t is something that we can calculate recursively the this l at time l at time t is equal to negative negative of b t transpose k t plus 1 b t plus r t whole inverse b t transpose k t plus 1 a t and these these k's also can be computed have to be computed recursively the k's let me write this on the next page the k at time n is simply q n itself the k at any other time t can be written as a t transpose k t plus 1 minus k t plus 1 b t times b t transpose k t plus 1 b t plus r t the whole inverse b t transpose k t plus 1 times a t plus q t and the cost the the optimal cost is actually given in this form remember this is nothing but the last step of the dynamic programming equation that is j 0 of x 0 that is equal to x 0 transpose k 0 x 0 plus summation write this as t equal to 0 to n minus 1 the expectation of w t transpose k t plus 1 w t this here is is the solution of this problem. So, the optimal policy the optimal policy takes this particular form the optimal policy is actually linear in us in the state the gain that we apply on this this is this is called this here is called the Kalman gain the optimal policy is linear in the state the the gain that you need to apply can be computed recursively. So, these give you recursive equations to compute the gain the key equation here in that is this one this is what is called the Riccati equation the key this is the key equation once you you you need to compute this backwards in time you started off seeded with k n as as q n and then you work for work backwards in time to compute compute this for all k using that one can compute compute the else that the the Kalman gains and then using the Kalman gain we have the optimal policy. So, the optimal policy is linear is linear in the state the cost the optimal cost actually can be computed also from this from the from the from these matrices that we get from from from the Riccati equation the optimal cost has takes this form that it is quadratic in the initial state it is x 0 transpose k 0 x 0. So, it is a quadratic function of the initial state plus a term that is that that depends on the variance of the noise remember noise was assumed to have was assumed to have 0 mean by and and finite variance. So, this term only this term is finite when the variance is finite. So, it is some addition this this is an additional cost that you end up having in addition to x 0 k 0 x 0 transpose k 0 x 0. Now, here is here are some things you can note in addition to this structural result. So, the reason we have been able to solve for this in closed form is because of this beautiful coincidence between the linear dynamics of the problem and the quadratic nature of the cost. The linear dynamics and quadratic cost help allow us to compute the optimal policy as a linear function of the state the optimal action is is just a matrix times the state equation is times the state at that time this then propagates from once one time step to the other. So, the value function at the terminal condition is quadratic that gives us the value function at any intermediate times is also quadratic and moreover convex and quadratic. And therefore, we get that the optimal cost can be optimal action can be computed by optimal policy is linear and then finally, we get that the optimal cost is also quadratic function of the initial state. In addition to this the here is one other thing to remember notice. So, if you if you take a look at this this policy this policy here is given is given by is is linear of course, but notice how LT is actually computed LT is computed from BT, AT, RT and KT and KT is themselves are computed recursively from this equation here. So, this is where we get the KT is from. So, KT is themselves they depend only on Q's on the Q's the QT is the AT is BT is an RT. So, in other words does it the computation of KT is only as only requires as it does not. So, the computation of KT does not depend on the variance of the noise. So, the entire optimal policy here is actually the same regardless of whether there was noise in the system or not. If there was no noise in the system then in that case the variance of the of the noise would be 0 and this expectation here would be 0. But the optimal so the cost depends on the on whether there is noise in the system or not, but the optimal policy does not depend on the noise. So, regardless of whether there was no there is noise in the system whether the system was noisy or not noisy whether the system dynamics evolved in with with you know in a noisy fashion or or in a noiseless fashion the optimal policy is still the same and the optimal policy is still linear in the state and what one has to apply is still the Kalman game. So, this optimal policy is the it is a remarkable fact in in linear quadratic control problems that the optimal policy is the same for the problem with noise and without noise. So, let me note this down here the optimal policy for the noiseless problem is equal to optimal policy of the deterministic problem. So, these two are the optimal policy is the same for noise for the noisy and the noiseless problem. So, where does noise appear? Well noise appears only in in in the cost it appears in this quadratic fashion in the score. So, what does this mean? This means that the the optimal cost that you incur when your problem has when the system dynamics have noise is the equal to the optimal cost that you would incur when you did not have noise plus another offset or another an additional term that additional offset is caused because of the noise in the system. So, what you do what happens is that you pick up this additional additional term that is that at every at every time step which is caused basically because because the origin because your system is actually noisy if that noise was not there then then then what you the optimal policy that you would apply would still be the same and what the the optimal cost that you would get would be just this term x 0 transpose k 0 x 0 as a function of the initial state x 0. So, notice so let us notice one more other thing. So, notice that the optimal cost depends on the variance of the noise the optimal cost here depends on the variance of the noise. So, it depends on this the optimal cost here the optimal policy does not depend on anything about the noise just only thing we have assumed is that it has 0 mean. So, it may be noisy it may not be noisy but the optimal cost here depends on the variance of the noise. In other words it depends only on the first two moments it does not actually matter what what the precise distribution is. So, the cost is the same for all regardless the cost would be the same once the first two moments are of the noise are the same. So, you can take two different sets of disturbances that match in the in their first two moments and the optimal cost would remain the same. This is again a remarkable fact of of linear and a remarkable coincidence of the linear quadratic linear linear systems with quadratic cost problem that the not only does one not need to know anything about not only does one not only does the deterministic and the stochastic problem have the same optimal policy the optimal policy the optimal cost is also invariant over over distributions. So, long as the distributions agree in the in the first two moments. So, what is this basically tell us this tells us that you can this suggests let me say that an approach to stochastic control problems could be the following. One could potentially just look at the problem by by replacing all the noise in the system by the mean of that noise. So, you look at your system dynamics as if the noise was replaced by its mean in this case the mean is 0. So, in other words the noise is 0. So, there is no noise in the system and you solve for that would then give us a deterministic system. So, you take this noisy system replace all the noise in it by its mean that would give you a deterministic system. Then from that deterministic system you try to find an you find an optimal policy for the for the resulting deterministic problem. And then you you find that well low and behold that policy is actually optimal for the original problem. So, this this here is the flow let me write this again for you. So, you take a noisy problem replace noise by its mean this gives us a deterministic we get a deterministic problem as a result of this deterministic problem dynamic programming problem find optimal policy. So, here you apply any deterministic control technique you find an optimal policy and this here completes the loop policy is optimal for noisy problem. So, what we have just discovered in the in the in the above in the above linear problem with linear system with quadratic cost is that this loop is true for the above problem. This this it is it is this suggests that this this is actually true for the above problem. Now, why is this so, so this the this suggests that this now what we have this is what we have found is that this is true for the above problem. Now, what this suggests is that this this may be true more generally. So, this here is often known by a principle it is what is called the certainty equivalence principle. So, whenever one can one has this particular property that you can take a noisy problem replace the noise by its means solve the deterministic problem get the optimal solution and then that solution also turns out to be optimal for the noisy problem that is what we then we say that certainty equivalence holds. But now it is remarkable that certainty equivalence basically holds for the for linear systems with quadratic cost in the way we have defined. Now, what this has suggested is that this this may be more generally true this has suggested that probably maybe you can solve all stochastic control problems with this particular technique by just simply replacing noise by their mean and then finding their optimal policy finding the optimal policy by looking at the at the corresponding deterministic problem. In other words, it has been tempted to to look for to to conjecture that certainty equivalence holds more generally in across across the board in all sorts of problems. Unfortunately, this is not true. In fact, the very first lecture of our of our course we realize that risk plays an important role and in and the way the the way risk manifests in stochastic control problems is through higher moments of the noise. One cannot simply replace noise by its mean and then get and and look at the and come and arrive at an answer by just simply looking at the mean of the noise. So, the mean does not adequately capture everything about about about about the random variable in question when we are looking when we have a stochastic decision problem. Nonetheless, this is something that people have continued to look for they will continue to look for when the when certainty equivalence holds and that that is a continues to be an ongoing ongoing effort in research. But I must warn you that this particular coincidence should not be stretched too far. This coincidence works in this particular setting because because of you know because of the excellent alignment between the nature of the the shape of the cost function, the nature of the dynamics and so on. It does not the certainty equivalence does not hold more generally across all problems. Having said that it has the the linear system with quadratic cost have wide number of wide wide wide applicability in industry and as a result of that the certainty equivalence principle has has has become has become extremely widely applied in a lot of industrial problems.