 So, in the previous part we had arrived at this expression here where we had agreed that the minimization can be brought down to minimization over u n minus 1 can be only brought down to this sort of expression. So, because this is linear in x n minus 1 and quadratic in u n minus 1 and moreover this is actually a convex quadratic in u n minus 1 this convexity can be checked just like we did it for the perfect information case this is this we can find the minimization minimizer minimizing u n minus 1 by simply putting the derivative of this term equal to 0 and that gives us that u n minus 1 the optimal u n minus 1 let us call that u n minus 1 star is a function mu n minus 1 star of the information i n minus 1 and that is given by the negative of b n minus 1 transpose q n b n minus 1 plus r n minus 1 the entire in the inverse of this times b n minus 1 transpose q n a n minus 1 times the conditional expectation of x n minus 1 given i n minus 1 right. So, notice the form of this particular this particular optimal control policy this is this is given this is now the optimal control policy at time n minus 1. So, notice the form of this particular this particular policy this particular policy what had multiplies by some multiplies the conditional expectation of the state it multiplies the conditional expectation of the state here with an with a constant matrix outside. So, in other words it is a linear transformation of the conditional expectation of the state further if you think more in fact if you go back to your to your lectures on the problem of perfect with problems with perfect state information you would notice that this underlined matrix here is in fact the same as the one that we had in the in the case where we had perfect state information. So, this this here is an important fact. So, what we are finding is that the optimal policy has a certain form the optimal policy. So, optimal policy is equal to the optimal policy is for this for for the for the problem with the imperfect state information is equal to the optimal policy perfect state information, but with the state replaced by the conditional expectation the conditional mean of the state conditional mean of the state given the information. So, so the optimal the optimal policy with the perfect state information is in fact equal to the optimal policy with the imperfect state information. Only thing that you need to do, so you apply the same matrix transformation, you multiply by the same matrix as you did in the perfect state information. The only difference is that when you have perfect, when you have imperfect state information you multiply that matrix to the conditional expectation of the state whereas if you have perfect state information you multiply it with the state which is already known to you. So, this is a tremendous result, because this is telling us that there is something that telling us a way or for reasoning about problems with imperfect state information. It is saying that the way to reason about these problems is that you pretend as if the problem is one with perfect state information and find the optimal policy and then just use another technique all together to compute your best estimate of the state and then apply that optimal policy on the estimate of the state or essentially by estimate I mean basically the conditional mean of the state given the information. So, this is what we have found at time n minus 1. Let us see if this in fact carries over to other time steps as well. So, let us do a few more calculations to see how far this can be pushed. So, now substituting this mu n minus 1 star in J n minus 1 to get J n minus 1 we get J n minus 1 of I n minus 1 is equal to the expectation of the expectation now remember the in J n minus 1 we still have these terms remaining with us this is still a term that is remaining there is this term that is remaining. This here has been substituted for with mu n minus 1 star which is this expression. So, let us put putting all of this together gives you that the J n minus 1 given which is a function of I n minus 1 is the expectation with respect to x minus 1 of x minus 1 transpose k n minus 1 x n minus 1 conditioned on I n minus 1 plus plus the expectation with respect to x n minus 1 of x n minus 1 minus the expectation of x n minus 1 given I n minus 1. This transposed with p n minus 1 times x n minus 1 minus expectation of x n minus 1 given given I n minus 1 the whole conditioned on I n minus 1. And then you have also the trailing constant which comes because of the noise the system noise that is w expectation of w n minus 1 transpose q n w n minus 1. Now, what are these matrices k n k n minus 1 and p n minus 1 we write these down p n minus 1 here is a n minus 1 sorry a n minus 1 transpose q n b n minus 1 times r n minus 1 plus p n minus 1 transpose q n b n minus 1 inverse b n minus 1 transpose q n a n minus 1 and k n minus 1 is a n minus 1 transpose q n a n minus 1 minus p n minus 1 plus q n minus 1. Notice that this expression here is also eerily similar to the one that we have seen before the there is now in there is now an additional term that has creptin this that additional term is actually this term here this is the one that has creptin and but so if the we would not have had this term if we had not seen this particular term when we were when we were doing problems with perfect state information in that we had we only had this particular term and and this term. So, the this term is actually arising because of the imperfect state information that we have notice that this term would vanish if i n minus 1 was had in it information of the state. So, if if i n minus 1 gave you the also had it had in it the information of the state then this conditional expectation would be i x n minus 1 and then there as a result of that this entire term would vanish. So, this is effectively the cost that we are paying because of lack of information perfect information of the state. But nonetheless although the expression of the optimal value function or the cost to go has has changed remember the policy is still the same we are still using the same policy that we would have used in the absence of in the absence of imperfect information. So, we are using the same policy that we would have used as in the case when we had perfect state information. So, now let us now let us apply this for the next time step and let us try to do this for j n minus 2. So, j n minus 2 of i n minus 2 it can be written out as follows. Now, this time I will be a little more brief I will just write remember write this as. So, first let us write out the DP equation and then this directly write the simplified form. So, it is a minimum over u n minus 2 of the conditional expectation. Now, with respect to x n minus 2 w n minus 2 and z n minus 1 this is the conditional expectation of x n minus 2 transpose q n minus 2 x n minus 2 plus u n minus 2 transpose r n minus 2 u n minus 2 plus remember we have now the cost to go which is coming from what we have calculated about is j n minus 1 i function of i n minus 1 and we have to condition now condition this on i n minus i n minus 2. Now, I will write in this case explicitly the u n minus 2 so that we are remember that it is also present in the when we are conditioning and then this is being minimized with respect to u n minus 2. So, if I open up this expression here what I will once again get a quadratic expression in x n minus 2 i the quadratic expression there is one quadratic here there is also another quadratic. So, this is a quadratic in x n minus 2 here. So, now I will also get a quadratic in u n minus 2 out here. Now, in addition to this I will also get I also have j n minus 1 of i n minus 1. Now, j n minus 1 of i n minus 1 remember can be written out from this above expression out here but so it is it has to be written out carefully as a function of i n minus 1. So, but remember so it will have it has a conditional expectation of i n minus 1 here and outside here we are also conditioning on i n minus 2 and u n minus 2. So, as a result of this the since i n minus 1 has in it more information than i n minus 2 and then i n minus 2 comma u n minus 2 as a result of that it the conditioning that is the conditional the expectation actually transfers to this particular term. So, you would end up conditioning on i n minus 2 and u n minus 2 instead. So, let me write this here write the expression and then we can explain it. So, we get we have conditional expectation of x n minus 2 transpose q n minus 2 x n minus 2 given i n minus 2. So, let me write this minimizing this over u n minus 2 we have this plus we have a u n minus 2 transpose r n minus 2 u n minus u n minus 2 plus now here is the here is the term that I am referring to when I substitute j n minus 1 of i n minus 1 I here I would I have j n minus 1 of i n minus 1 but then out I am also conditioning on i n minus 2 comma u n minus 2 in j n minus 1 of i n minus 1 I have a conditional expectation of of x n minus 1 k transpose k n minus 1 x n minus 1 conditioned on i n minus 1. So, when I do that when I take the condition when I condition when I have I therefore have a conditional expectation and nested inside another conditional expectation by the laws of conditional expectation this becomes this then becomes simply a conditional expectation over the weaker information. So, this becomes x n minus 1 transpose k n minus 1 u n minus sorry x n minus 1 conditioned on i n minus 2 u n minus 2 plus now I have my error term the error term is much more complicated to write because it is not it has a conditional expectation inside but then there is also a conditional expectation outside the outer one gets replaced by the weaker conditional expectation. So, that gives me x n minus 1 minus conditioned on i n minus 2 u n minus 2 and then I have my last term here which is which is the trailing constant term. So, this expectation is over x n minus the this conditional expectation here is over x n minus 1 this condition this expectation here is is over x n minus 2 and then I have my last term here w with respect to w n minus 1. So, this w n minus 1 transpose q n w n minus 1. So, this becomes our final expression now this expression will require some analysis because it is it is not as simple as the one we saw for the case of perfect information. So, this will need some thinking on to how to on how this can be simplified and how it can be extended to the to the further time steps we will do that in the next lecture.