 Welcome, everyone. In the previous example of the machine repair problem, we have seen how the when we apply the dynamic programming equation or the Bellman equation to find the optimal policy and the value function, we have to actually do a number of calculations. If you recall, we had just a two-step problem in which we had only two actions and two possible states and yet even in that sort of a problem, we had to just to find the value function at time step 1, which is the second time step, we had to go through eight different cases. If you recall here, this is case 1, then there is case 2, case 3, 4, 5, 6, 7, 8 and each of these cases involved an optimization in itself. So, because in this case, we had only two actions, the optimization was trivial, we were just comparing the cost of the two actions. But then in general, if once one has a larger number of alternatives to choose from or larger number of control actions, this optimization can also be very tedious. In other words, the entire problem ends up having a huge computational cost, and that cost comes about because the number of possible histories that one has to keep track of, which is essentially the, which is encapsulated in the information that you have at any point of time, that keeps growing with time. Now, in this case, we have a three-dimensional information vector and that if you have a longer history, this would be even larger and that would even that would therefore result in that many more possibilities for the value of the information vector and that many more computations that we need to do. So, therefore, a central question in problems of with imperfect state information is how do we reduce this particular complexity? In particular, is there some way by which we can keep track of much less information than what is present in the information vector? Is there something that we can keep track of which is more parsimonious for which does not grow in complexity as much as this particular quantity does? And then as a function of that, we might be able to reduce, find the optimal policy as a function of that particular quantity and that will then result in a reduction in complexity. This particular quantity is known by the term sufficient statistic. So, sufficient statistic is that piece of information which is essentially, which is enough to give you all the information that you need, all the information that you would have got from the information vector itself. So, the information vector hides within it a much more compact or compressed representation which comprises of the essential information that is needed for your particular task. The sufficient statistic is a concept form statistics, but it is used in control theory in order to mean the bare minimum information that you need to keep track of in order to do control in order to apply the optimal control action. We will soon see in the forthcoming lectures how the sufficient statistic plays an important role in all of these problems and what really is the sufficient statistic for various classes of problems. So, what we will do now is actually look at a particular another special case. This special case is that is the problem of linear systems with quadratic cause, but with imperfect state information. And here we will see that there is something much simpler that we can keep track of then what we have been doing so far and we will find that in some sense we would we would have discovered in some sense the sufficient statistic for this particular problem. So, the problem that we have is a comprises of a linear linear system. So, you have x k plus 1 which evolves as a k x k plus b k u k plus w k where k now belongs goes from 0 to n minus 1 and we have a quadratic cost which we want to minimize. So, we want to minimize the expectation of x n transpose q n x n plus k going from 0 to n minus 1 x k transpose q k x k plus u k transpose r k u k. Now, you would recollect that this this problem looks a lot similar to the one that we had studied before where we had perfect state information. In fact, there also we had a linear system with this in this kind of form and there also we had a quadratic cost of this form. I have deliberately not written out what the minimization here is over because I have not yet specified what the information is. So, the way this problem differs from the earlier problem with where we had perfect state information is because there is an element of partial or in imperfect state information here. So, at the beginning of each period k at the beginning of each period k we get an observation. The observation comes to us as in the following form it is the observation is given by z k equals c k x k plus v k here c k is a fixed known matrix. The terms here of all these terms here w k and v k these are noise c k are these are known and deterministic matrices. In general, this c k may not be invertible or even if it is then you would still not be able to use that fact in order to recover x k. So, consequently we do not know what the state is exactly and all we know is this observation z k. The similarly the matrices a k, b k these are also known and deterministic. We again assume that the q k matrices these are positive semi definite and we assume that r k are matrices are positive definite. Now, we since I have said that v k and w k are noise we will make further assumptions about this noise we will assume that these are independent. So, v w k v k are independent and 0 mean. So, they have mean 0 and we will assume that they have finite variance as well. So, as before we will assume that their variance is finite and we will assume that they are there is also an additional source of randomness which comes from the initial state. So, they are independent this is also they are also independent of the initial state which is denoted x 0. So, what we are doing therefore is again we are minimizing this cost now I can write out what we plan to do we want to minimize this cost over mu 0 to mu n minus 1 where we are choosing mu u k as a function mu k of I k and as before I k comprises of all the observations that you have up until that time and the control actions that you have taken up until the previous time. So, this completes our problem statement what we will now do is apply the dynamic programming equation to this particular problem. So, the dynamic programming we will apply the dynamic programming algorithm. So, the dynamic programming algorithm as before begins at time n minus 1 and we will write J n minus 1 of I n minus 1. Now, remember this is equal to the minimum over u n minus 1 the action that we plan to take of the following expression. It is a minimum of the sum of the expected the expected stage wise cost and the expected terminal cost. So, the expected stage wise the expectation of the stage wise cost is the stage wise cost at time n minus 1 is x n minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 transpose R n minus 1 u n minus 1 plus the terminal cost. But in place of the terminal remember the terminal cost here is x n transpose q n x n and what we will be doing is substituting for x n in terms of this equation. So, we will substitute for x n as in terms using this equation. So, we will do that substitution and that gives us a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1 transpose q n times a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 plus w n minus 1 and this is be conditioned on remember the expectation here is a conditional expectation we conditioned on the on the information. I can also write here the control action just to be explicit or I can we can proceed assuming that the control action is a parameter it does not matter the answer would be the same. The expectation here is a conditional expectation. So, the expected it is conditioned on i n minus 1 and you are taking expectation here over all the other things that are random remember then this is an imperfect information problem. So, x n minus 1 is now random unlike in the problem with perfect information. So, x n minus 1 is random w n minus 1 is also random all the information that we have at time n minus 1 is is encompassed in in i n minus 1. So, the realizations of of the observation noise are all present are somehow encapsulated in i n minus 1. So, now let us let us make some observations about this particular equation as before there is we what we will have to do is look at this term here which is the term that is quadratic that is quadratic in that involves some quadratic terms in w n minus 1. Now, w n minus 1 remember w n minus 1 these are independent they are independent and 0 mean. So, which means that given i n minus 1 the w k's and v k's are independent and 0 mean. So, given the so we already have that the if I look at the conditional expectation of w n minus 1 given i n minus 1 this quantity is simply is is actually just the in fact the mean of w n minus 1 itself and hence equal to equal to 0. So, this quantity is actually 0. So, now observing this let us make a few of let us note a few things. So, when we open up this particular quadratic quadratic here we would as before have terms that are linear in w n minus 1 we would have terms that are quadratic in w n minus 1 and have terms that are that do not have w n minus 1 at all. So, the the terms that are linear in w n minus 1 would the terms that are linear in w n minus 1 would be of the form where w n minus 1 is being multiplied by q and then multiplied by a n minus 1 x n minus 1 plus b n minus 1 u n minus 1. So, so thanks to this we have that a term like this this term which is w n minus 1 transpose q n a n minus 1 x n minus 1 plus b n minus 1 u n minus 1 this particular term here this quantity here is actually 0. Now this is the sorry I should note this condition on i n minus 1 is equal to 0 and the reason for this is as before remember that x n minus 1 here x n minus 1 is driven by the control actions and the noise that has happened up until time n minus 1 it is therefore therefore it is cannot be affected by the noise that happens at time n minus 1 itself. So, this remember in our model the the the the noise this noise occurs after the realization of x k. So, it does not actually it is independent it is it is it is not affected by anything that has happened up until then. So, and because it is independent of of all the previous sources of noise and the previous and the previous observation noise as well this this w n minus 1 here is independent of all of this. So, as a consequence of this the this expectation here is simply the will you can see is will be the expectation of this given i n minus 1 and the rest of it given i n minus 1. So, long story short this this entire expression actually evaluates to 0. So, as a result of this I can simplify the simplify what I have written about and I can get this smaller expression for j n minus 1 as a function of i n minus 1 that expression has a minimization over u n minus 1 then you have you have the expectation of x n minus 1 transpose q n minus 1 x n minus 1 plus u n minus 1 transpose r n minus 1 u n minus 1 plus now the terms that remain are remember we have we would have a quadratic in w n minus 1. So, let us write that out first. So, this is you have expectation of w n minus 1 transpose q n w n minus 1 this kind of term we have seen before it had come up even in the problem with perfect state information. So, I have dropped the conditioning on i n minus 1 because the w n minus 1 is in fact independent of everything that has happened in the past. So, so as a consequence I can drop the conditional the conditioned conditioning on i n minus 1 then what I am left with is only this term and then I have a third term the third term is the third term involves now what we can do is the following we can we can pull out terms here that that involve u n minus 1 and do not involve u n minus 1. So, when we multiply when we multiply this particular the this term here with this term we can separate out terms that depend on u n minus 1 and those that do not. So, the ones that do not let me write those first. So, I have x a n minus 1 x n minus 1 the whole transpose q n a n minus 1 x n minus 1 and then I have my term which depends on u n u n minus 1 as well. So, that would be u n. So, that term is u n minus 1 transpose or maybe let me write it in a in the following way. Let me write this as b n minus 1 u n minus 1 transpose q n times b n minus 1 you do not need the bracket here times b n minus 1 u n minus 1 and plus now. So, now here so what I have written out in so far is a is a is a product of these this term with this term. So, what we now need to write is the product of of this the product of this these highlighted terms with this underlined term. So, let us write write that out. So, that term here becomes b n minus 1 u n minus 1 whole transpose q n a n minus 1 x n minus 1. And remember since all of this is inside the expectation any I have my conditioning on i n minus 1 outside and then I close the bracket and then this is being minimized over u n minus 1 all right. So, there is some the reason we have written this out in the following form is that it affords some possibility for consolidation of some terms. So, for instance we can we can observe that this is a quadratic in x n minus 1 this is also a quadratic in x n minus 1. So, these two can be merged and written together we can also observe that we have we have a a quadratic in u n minus 1. So, that would be so we have a quadratic in u n minus 1 here and a quadratic in u n minus 1 here. So, these two can also be merged together. Now the there is in addition to this a linear term in u n minus 1 now the linear term in u n minus 1 is the one that need that we need to pay a little bit of careful attention to that is this term. So, this term is it has u n minus 1 appearing linearly and then multiplied by x n minus 1 which also appears linearly. So, let us take these terms one by one and start putting them together and then we will we will then come back to this. So, what we have here so let us let us put the quadratics in x n minus 1 first together. So, we have the so we have this particular you are minimizing this with respect to u n minus 1 we have the following expectation of the quadratics x n minus 1. So, x n minus 1 transpose I am just skipping a few steps here a n minus 1 transpose q n q n a n minus 1 plus q n minus 1 x n minus 1. So, these have been consolidated together remember the this expectation is conditioned again on i n minus 1. So, then I have my quadratic in w which so this is w n minus 1 transpose q n w n minus 1 expectation of that and then I have my terms in u. Now, let us go first we can combine the quadratics in u. So, let us combine those two. So, you have that combination gives you u n minus 1 transpose b n minus 1 transpose q n b n minus 1 plus r n minus 1 u n minus 1. Now, we are left with finally we have dealt with the yellow terms the blue terms and this un-underlined term. Now, let us look at the orange term. So, the orange term here has is linear in u n minus 1 and linear in x n minus 1. So, remember all of this is still inside this expectation out here. So, remember u n minus 1 was taken as a parameter. So, u n minus 1 is taken as a parameter. So, therefore it comes out of the expectation and then therefore so what we are left with inside the expectation then is just q n minus 1 a n minus 1 are constant. So, what we are left behind with what is inside the expectation is the expectation of x n minus 1 given i n minus 1. So, we have this term plus I think I have forgotten twice here. So, let me correct this here. So, when I when we take the cross terms we also get we get 2 here. So, we get this plus twice let me write it in the following way we have conditional expectation of x n minus 1 given i n minus 1 the whole transpose a n minus 1 transpose q n b n minus 1 u n minus 1. So, now the notice that notice that the middle term here has own is dependent only on u n minus 1 and since u n minus 1 was a parameter for the expectation this is actually comes out of the expectation. So, there is no need to put an expectation around this particular term. The expectation in the in the final term has eventually collapsed to an expectation only over x n minus 1. So, that is what has given us this expression here. The remaining here is now also outside the expectation because it is independent it is only a parameter. So, what we are left with is therefore, just a minimization of this complete expression with respect to u n minus 1. Now, this also we can simplify further by observing that this underlined orange term and this term they are in fact they do not depend on u n minus 1 at all. So, really the minimization over u n minus 1 is in fact instead of it being here it is actually the minimization can be put here. We can really put the minimization with respect to u n minus 1 out here. So, eventually what we are getting here is notice the following we are getting a quadratic in u n minus 1 who with a linear term that depends on the conditional expectation of the state at time n minus 1. Now, what we will do in the next part is solve for this and let us make some more observations on top of this.