 Welcome back everyone. So let us recall where we were, we were trying to solve a linear quadratic system, a system of this particular form here where we were get with where this state evolved in a linear fashion like this and we had a quadratic cost to minimize and the observations we were getting at each time were a linear function of the state corrupted by noise. And what we, the information that we had at each time step were all the observations so far and the actions taken so far and we were trying to minimize this cost by with respect to policies where the policy at time k mu k was chosen as a function of i k alone. Now we were doing this, we were applying the dynamic programming algorithm for this and we did this for time step n minus 1 and we discovered an amazing fact that the optimal policy actually the optimal policy mu n minus 1 star at time step n minus 1 is the same as the policy optimal policy for the perfect state information case. So this policy is the exactly the same policy you would have got had you assume that there was that you had perfect state information. But the policy has to be applied not on the state itself since we do not have that information but rather on the conditional expectation of the state given the information. So the policy has to be applied on this particular best estimate of our of the state at that time. So the optimal policy is the same as the optimal policy with the perfect state information but with the state information replaced by the conditional mean of the state given the information. So this gave us therefore an expression for j n minus 1, j n minus 1 at i n as a function of i n minus 1 this also looked rather similar to this to the perfect information case because you had a quadratic term here which was precisely the term we had seen earlier. We had this constant term which was floating around because of the noise in the system the system noise and but we also got an additional term here. And this additional term is what makes this problem different from this perfect state information problem that we notice that this additional term actually depends on this estimation error. It depends on how far x n minus 1 which is the true state is from our best estimate of the state which is the conditional expectation of the state given given the information. So it depends on this term. So you can actually see it is the it is this error multiplied by a weighting matrix P n minus 1 times the error again. So it is something like a mean square error but a weighted mean square error. So this term also appears in j n minus 1 of i n minus 1 alright. Now let us we then went to time step n minus 2 and which is where we sort of stopped in the previous lecture. So when we wrote this the dynamic programming equation out for time step n minus 2 we had a minimization over u n minus 2 outside and a conditional expectation with respect to all the noise that is left in the problem after conditioning on i n minus 2 and u n minus 2. Now notice that all the noise that is left is x n minus 2 which is the state that we do not know the noise system noise w n minus 2 and the new observation z n minus 1 that is not included in i n minus 2 but included in i n minus 1 right. So this new this is the expectation is taken with respect to this. So I realized that in when I wrote this out last time I have actually not written out all three of these. So let me let us proceed with the understanding that when I write the expectation the expectation is really over all of these all these all these elements okay. So ignore this part for the moment. Now what we now let us let us analyze this in depth a little bit. So your we now have a minimization outside and inside there is a there are several expectations that have been taken. So here is the minimization and then here they have expectations and so on. What we need to understand now is which of these terms actually depend on u n minus 2. So that we can then take find the minimum with respect to of those terms with respect to u n minus 2. So the first observation here is as far as the first term is concerned which is this one here. This actually does not depend this does not depend on u n minus 2 at all. The reason is because this depends on the state x n minus 2 is it is conditional expectation of the state given x n minus 2 given i n minus 2. So there is no dependence of u n minus 2 in this particular in this particular term here. So this u n minus x n minus 2 is the state at time n minus 2 which is when you take action n minus 2 and all the information that we have about that state is already encompassed in i n minus 2. So this actually does not depend on u n minus 2. This term here does depend on u n minus 2. So let me underline this term. This term does depend on u n minus 2 here. This one depends on u n minus 2. The last term here also depends on u n minus 2. The reason it depends on u n minus 2 is because the x n minus 1 which is the state at the next time step depends on u n minus 2. So this depends on u n minus 2 x n minus 2 and w n minus 2 and so on. So somewhere hidden in this is actually u n minus 2. So this also depends on u n minus 2. The last term here which is a constant term, this actually does not depend on u n minus 2 and that is self-evident because this is really a depends only on the statistics of the noise. We have finally then this term left and the question here is does this term depend on u n minus 2. Now if this term depended on u n minus 2 then the problem remember this term is the additional term that we got in this problem because we had imperfect state information. This term was not there when we had perfect state information. So if this term somehow turned out to be independent of u n minus 2. So let me underline this term with a different color. So if this term turned out to be independent of u n minus 2 then things would be very nice because then the entire logic would proceed almost in precisely the same way as it would for the perfect state information case because in the perfect state information case you had precisely you had exactly these two yellow or underlined terms. So we could then just piggyback on the results that we have already derived for the perfect state information case and then and compute the optimal policy from there. So the hope then is to somehow establish that this is u n minus 2, independent of u n minus 2. Now why would this be independent of u n minus 2? So let us try to think about this intuitively a little bit. So what is this term really? This term is actually the is capturing for us the error between the state and your best estimate of the state. So it is essentially the residual uncertainty that is left after you have estimated the state using the information that you have. The residual uncertainty in the state that is left after you have taken the best estimate you could using the information that is left. Now what we need to argue is that this here is independent of is that this becomes independent of u n minus 2. Now why would this become independent of u n minus 2? Well it would become independent of u n minus 2 if you know the residual uncertainty that is left at the after we have taken the best estimate if that residual uncertainty is really the intrinsic uncertainty in the problem which is driven by the noise of the problem. So if the residual uncertainty is all a function of the noise of the problem then we could really then we have the result as we are looking for which is that then this term would become a function of just the noise and the choice of the control action and the control policy would be irrelevant. So this is exactly what we will now we will now establish. So what we will establish now is the following lemma. So here is a lemma for every k there is a function mk such that we have xk minus conditional expectation of xk given ik is a function mk of the following variables which are just the noise in the system. So x0 w0 to wk v0 to vk independently and the function itself is independent of the policy use independently of the policy used this is what we will show. So here is the proof for this. So in order to do this what we will do is we will construct actually two systems the first system is the one we already have which is being driven by the control that we are some control policy that has been chosen and then another system which is uncontrolled but powered by the same noise the same noise as in the original system. So here is our original system. So first let us fix a policy and consider these two systems. So first is your original system which is xk plus 1 equal to ak xk plus bk uk plus wk and we get observation zk equal to ck xk plus bk. The second system is xk bar which the system state is denoted by xk bar. So xk bar xk plus 1 bar equals ak x bar k plus bk so sorry plus w bar k. So there is no there is no control in this you can think that control is 0 or some nominal control has been applied in a basically there is no control the control term is absent. And zk bar is the observation that we get of this particular system which is ck xk bar plus vk bar. Now how are these wks and wk bar and vk and vk bar related? We will assume that wk is equal to wk bar and vk is equal to vk bar this is true for all k equal to 0 to n minus 1. We will also assume that the systems are initiated with the same initial conditions. So x0 is equal to x0 bar. So now let us consider the vectors zk zk is this column vector I will let me write this as a column vector for your convenience z0 to zk. This is the observation vector of all the observations till time k zk bar is an observation vector of all the observations of the second system up until time k. Wk is wk is equal to w0 to wk vk is similarly v0 till vk and uk capital UK is the control actions u0 to uk. Now because these two states states these two systems are linear we can substitute for using the state equation of the respective system in at each time step and back substitute and so you can substitute for xk in terms of xk minus 1 xk minus 1 in terms of xk minus 2 and so on and similarly for the second system using this we will be able to conclude we can say that xk is actually a linear function of let me write this on the next page. So we will be able to conclude that xk is equal to fk sum fk x0 plus gk uk minus 1 plus some hk times wk minus 1 and similarly x bar k is equal to now x bar k would also be equal to fk x0 plus hk wk minus 1. Now how did I get this well when I substitute back xk in terms of xk minus 1 xk minus 1 in terms of xk minus 2 and so on eventually what I will be left with is a vector of eventually the x's will get substituted in terms of x0 and the control actions so far and the noise so far. So what is going to happen is I will get something that is a linear function of the initial state the entire history of control actions that is uk minus 1 and the entire history of the noise so far which would be wk minus 1 and exactly the same coefficients will occur even in the system in the in the second system the one with without the control. The reason for that is because the system constants are all the same you still have the ak here you still have ck here sorry you still have ak here and when you keep substituting back the dependence on x0 will remain would be the same so and similarly the dependence on w's as well. Then using that wk is equal to w bar k you will be able to eventually write something like this. So these therefore this is how we can represent the state of the system in at state of either system it at time k. Now the let us let us make a few observations based on this. So first observation we have that let us look at our control vector uk minus 1. So uk minus 1 see uk minus 1 remember was equal to this vector which is u0 till uk minus 1 the vector of control actions and this is present as a in is present in i k remember because when you are taking the information we have all the observations so far and all the actions taken so far. So this is present in our information. So in other words therefore if I take the conditional expectation of uk minus 1 given i k this conditional expectation is simply uk minus 1 itself. So this is we have this particular property. So now what we will do is we will take conditional expectations on both of these equations that have been written here. We will take conditional expectation with respect to i k in both of these and so we have conditional expectation of xk given i k equals fk times conditional expectation of x0 given i k plus now remember the conditional expectation of uk minus 1 given i k is just uk minus 1. So that will come out of the that comes out of the conditional expectation. So it is dk times uk minus 1 plus hk times I am taking the condition I need a conditional expectation of wk minus 1 given i k and similarly I have another equation like this for the second system. Remember in the second system also I am taking conditional expectation with respect to i k not I am not with so which is the information of the first system all right. So it is still this so it is still fk given fk times expectation of x0 given i k plus hk times expectation of I should write this properly this is capital W capital Wk minus 1 given i k. Now we can subtract the 2 and when we subtract the 2 we get we see that there are we can actually yeah so the thing to notice here is that the fk here is the same as the fk here and all 3 fks all of these 4 fks are the same all of these these 2 gks are the same and similarly these 3 h's are this these 4 h's are the same this is because of the linearity of the underlying system right. So what we will do now is subtract we will let us look at the subtraction of xk minus this minus the best estimate of xk given given the information remember we this is what we wanted to show we wanted to show that we wanted to show that this this the same quantity xk minus the conditional expectation of xk given i k is is a function of just the noise independently of the policy use right. So let us let us see what happens when I subtract xk from this what would happen is your x0 will get subtracted from expectation of x0 given i k wk will get subtracted from expectation of wk given i k so and and this term the yellow underlying terms will actually get cancelled because they are the uk minus 1 is times gk is present in both. So what we will be what we will be left with is actually the same as what you would get if you subtracted xk bar from expectation of xk bar given i k okay. So so if you subtract this this equation here from this equation that is exactly what you would get. So in other words you can show you can observe that this is in fact equal to xk bar minus the conditional expectation of xk bar given i k all right. So we so far so good. So we have what have we shown here we have seen that the error the estimation error that means that the residual uncertainty that as I was calling it between the state and the and the best estimate of the state given the information that residual uncertainty is equal to the residual uncertainty that you would have in the uncontrolled system. Now does this actually show that this this residual uncertainty therefore is independent of the control the answer is no because if you see here this this residual answer there is a the control actually we are taking the conditional expectation here with respect to i k which is the information of the control system and the information of the control system does depend on the control actions and the control policy that you have that you have chosen. So there is possible there is still this hidden here okay the u the u the u k is actually hidden here. So one has to be careful here so this is not automatically independent of the control action you need to you know argue a little more in order to do this in order to say that. So now in order to say this let us let us also now go back to our z's. So z's remember the z equation here that is that is out here this and this they can also be written in terms of all the noise in the system. So what I can do is I can substitute all the x's using the state equation in the z equation I can substitute the x bars using the state equation of the second system into this observation equation and then eventually I will get something that is a linear function of the observation noise the system noise and the initial state right and of but but in the this would have but in in the case of the in the case of the the the system the in the case of the observations for the for the control system the control will also be present over there whereas for the observations of the uncontrolled system that there will be no presence of control. So we would again just like we had for this estimation error we would have something similar for z as well which would which is this identity we would have that z bar k is is equal to actually z k minus some matrix let us call this n k times u k minus 1 and this difference this difference is basically the difference between the observations that you have the it is essentially what it is saying is that if you if you take the observations of the of the control system they are equal to the they are equal to a linear function of the control actions plus the observe and the observations that you would have from the uncontrolled system because the observations that all the contribution of the noise is already captured in z bar k right. So this is therefore equal to this in fact this difference is actually some a function of just the capital W k and capital V k in fact we can write this as some s k times W k minus 1 plus T k times V k as as you can as you can see her all right. So this is how we can express this now in other in other words we can we can also play play around with this in a slightly different way we can write this as z k is is therefore equal to n k u k minus 1 plus s k W k minus 1 plus T k V k this is what the information this is this is this is the capital Z k all right fine. So now let us come back to the term that we were looking we were concentrating on which is x k bar minus the conditional expectation of x k bar given i k. So this this this term here so let us let us look at this term. So this so in particular let us look at this this part here x k bar given i k now what is what is i k really i k i k itself is actually just z k and comma u k minus 1. So it is all the observations and all the and the actions taken so far right but z k itself so but given u k minus 1 so your u k minus 1 is present in in in in i k right. So when so given z k and u k minus 1 you can reconstruct z bar k right. So therefore this is actually nothing but z bar k comma u k minus 1. So why is the why are these equal well these are equal because given the u k minus 1 and given if you are given u k minus 1 and z k you can construct back from there you can construct z bar k from there and given the z bar k and u k minus 1 you can reconstruct z k from there. So being given z bar k and u k minus 1 is equivalent to being given z k and u k minus 1. So the information content in the pair z k comma u k minus 1 is the same as the information content in z bar k comma u k minus 1 technically the sigma algebra is generated by these two are the are the same. So as a consequence of this we now have that this conditional expectation here of x bar k given i k this conditional expectation is simply the conditional expectation of x bar k given z bar k comma u k minus 1. So now let us let us let us see what what this basically says. So this term here the the u k minus 1 is the control actions that we applied on the state of the of the control system z bar k are the observations that you are getting of the uncontrolled system. So the control actions that you applied on the of the on the control system only give you information about the control system it does not give any further information to us about the uncontrolled system. So there is nothing further to be learned about x bar k from from z bar k and u bar k than can be learned directly from z bar k itself. So in other words u k minus 1 does not provide us any further information about x bar k minus 1 because after all it is an act control action applied on the on another system. So all that we could all that there is to be learned from here about this particular system is there in in z bar k once you are given that. So as a consequence of this this is actually equal to conditional expectation of x bar k given z bar k. So this this becomes our main observation and why did we get this we got this because there was this nice linearity between z bar k and z k. Now let us come back here so what is z bar k well z bar k is a function of just these it is a function of the of the system noise and the observation noise. So as a result of this this is therefore a function a function m k of just the initial conditions and the noise in the system and this is this is irrespective of the policy used because the choice of u k minus 1 was independent was immaterial. So as a result of this we have concluded that that a conditional expectation of x bar k given i k minus 1 is a function of just this and independently of the policy used. So this therefore helps us prove the this the lemma that we wanted. So we will now use this particular lemma in the in our next lecture to complete the calculations of the dynamic programming dynamic programming algorithm applied to the problem with imperfect state information.